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This  research  addresses  algorithmic  approaches  for  solving  two  different,  but  related,  types  of  optimization 
problems.  Firstly,  the  research  considers  the  solution  of  a  specific  type  of  assignment  problem  using  con¬ 
tinuous  methods.  Secondly,  the  research  addresses  solving  systems  of  inequalities  (and  equalities)  in  a 
least  square  sense.  The  specific  assignment  problem  has  piece-wise  linear  additive  separable  server  cost 
functions,  which  are  continuous  everywhere  except  at  zero,  the  point  of  discontinuity  for  the  {0,1}  assign¬ 
ment  condition.  Continuous  relaxation  of  the  {0,1}  constraints  yields  a  linear  programming  problem.  Solv¬ 
ing  the  dual  of  the  linear  programming  problem  yields  the  complementarity  conditions  for  a  primal  solution, 
a  system  of  linear  inequalities  and  equalities.  Adding  equations  to  this  system  to  enforce  a  {0,1}  solution 
in  the  relaxed  solution  set  yields  an  augmented  system,  not  necessarily  linear.  Methods  to  solve  this  sys¬ 
tem,  a  system  of  linear  inequalities  and  non-linear  equations,  in  a  least  square  sense  are  developed.  The 
specific  assignment  problem  is  a  variation  of  problems  which  are  amenable  to  strong  continuous  relaxation, 
in  that  the  solution  set  of  the  relaxed  problem  has  been  shown,  experimentally,  to  often  contain  a  {0,1} 
solution.  However,  if  there  are  a  large  number  of  variables,  efficient  continuous  (non-combinatoric) 
methods  are  needed  to  locate  {0,1}  solutions,  if  such  exist.  This  work  addresses  methods  to  find  {0,1} 
solutions  using  a  least  square  formulation  for  solving  systems  of  inequalities. 

There  is  a  large  body  of  work  dealing  with  nondifferentiable  optimization,  but  the  kind  of  nondifferentiability 
posed  by  inequalities  is  of  a  special  type.  By  considering  a  least  square  formulation  of  the  problem; 

(a)  meaningful  solutions  can  be  found  even  in  the  infeasible  case  which  naturally  arises  in 
applications  such  as  sequential  quadratic  programming; 

(b)  the  problem  becomes  one  (but  not  twice)  differentiable; 

(c)  the  generalized  second  order  differential  set  has  favorable  properties  that  allow  generalization 
of  classical  second  order  non-linear  least  square  algorithms; 

(d)  the  fundamental  computational  subproblem,  solving  a  system  of  linear  equalities,  is  efficiently 
solvable; 

(e)  the  problem  becomes  numerically  stable  (in  the  sense  of  Robinson). 


Common  algorithmic  approaches  to  solve  nonlinear  least  square  problems  are  adapted  to  solve  systems 
of  inequalities.  Local  and  global  convergence  results  are  developed,  using  properties  of  the  Clarke 
generalized  subdifferentiai  and  Jacobian.  Rates  of  convergence  are  analyzed.  Applications  of  the 
algorithms  for  solving  the  piece-wise  linear  assignment  subproblem  are  developed  and  analyzed. 
Application  of  the  algorithms  for  solving  linear  programming  problems,  and  linear  and  convex 
complementarity  problems  are  described. 
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Notation 


1)  3?''  denotes  the  real  n-dimensional  space.  Let  1"  denote  the  vector  of  all  ones  in  SR". 

2)  SR"  denotes  the  positive  orthant  in  SR". 

3)  Let  5  be  a  subset  of  SR". 

-  convex  hull  S  denotes  the  set  of  all  finite  convrx  combinations  of  elements  of  S. 

-  {d  1 1  e  S'  implies  x  +  Ad  e  S  for  all  A  >  0}  is  the  set  of  directions  of  recession  of  S. 

-  Let  6s(ar)  be  the  indicator  function  of  the  set,  S.  6s{x)  :=  0,  if  x  6  S;  otherwise,  6s{x)  :=  -t- oc. 

4)  Let  (•,  •)  denote  the  inner  product  on  SR".  Let  S-“-  ;=  {y(  (x,  y)  =  0,  for  all  x  e  S}  denote  the  orthogonal 

complement  of  the  set  S.  Given  a  €  SR",  a  scalar  a,  let  H  :=  {x|  (a,x)  =  a}  denote  the  hyperplane 

associated  with  a  and  q. 

5)  Let  K"  C  SR"  be  a  convex  cone,  then  denote  K*  ■=  {j/|  (x,  y)  <  0  for  all  i  e  K}.  K*  is  called  the  polar 
of  the  cone  K. 

6)  Let  C^[SR",SR"*|  be  the  set  of  all  continuously  once  differential  functions  from  SR"  to  SR*".  Similarly, 
C^[SR",  SR”*]  is  the  set  of  all  continuously  twice  differential  functions  from  SR"  to  SR"*. 

7)  Let  X  e  SR".  Let  5  c  SR".  All  vectors  are  written  as  column  vectors. 

-  ||x||  denotes  the  Euclidean  norm  of  x. 

-  dist  (x,  5)  :=  inf{||x  -  y||  |  y  €  5}. 

-  iV(x,  r)  {y  j  jjx  -  y||  <  r}  denotes  an  open  neighborhood  of  radius  r  about  i. 

-  x+  denotes  the  Euclidean  projection  of  x  onto  SR" . 

-  x_  :=  X  +  x+. 

8)  For  a  linear  map  G, 

-  denote  ||G||  as  the  induced  operator  norm, 

-  denote  A/'(G)  as  the  null  space  of  G. 

9)  A  function  /  is  Lipschitz  of  order  p  of  rank  7  >  0  on  a  set  5,  if  all  x,y  ^  S  satisfy;  ||/(x)  -  /(y)||  < 
ilk-yF- 

10)  Let  A  be  an  m  by  n  matrix  and  let  /  C  {1, 2, . . . ,  m}. 

-  Denote  1 1\  to  mean  the  cardinality  of  the  set  I. 

-  Ai  denotes  the  p  by  n  matrix,  where  p  =  j/|,  formed  from  the  p  rows  of  A  indexed  by  I. 

~  A^  denotes  the  pseudo-inverse  of  the  matrix  A,  the  generalized  inverse  giving  a  least  norm,  least  square 
solution. 

-Typically,  denote  index  i,  as  the  row  of  a  matrix. 

-Typically,  denote  index  j,  as  the  column  of  a  matrix. 

11)  For  a  matrix  A,  an  n  by  n  matrix,  denote  l|A||f,  the  Frobenius  norm  of  A, 
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12)  Let  5  be  a  subset  of  3?".  Usually  /  is  a  single-valued  functional  mapping  of  S  into  3?,  and  F  is  a 
single- valued  mapping  from  5  into  S?"".  A  property  of  /  is  satisfied  a.e.  on  S.  if  this  property  is  satisfied 
everywhere  on  S  except  on  a  set  of  measure  0. 

13)  The  Clarke  subdifferential  set  of  /  at  x  is  denoted  0f{x).  The  gradient  of  /  at  x  is  denoted  V/(x). 

14)  The  Clarke  generalized  Jacobian  set  of  F  at  x  is  denoted  dF{x).  The  Jacobian  of  F(x)  is  denoted 
JF{x).  Similarly,  the  Clarke  generalized  Jacobian  set  of  V/(x)  (sometimes  referred  to  as  a  generalized 
Hessian)  is  denoted  dVf{x).  The  Hessian  of  /  at  x  is  denoted  V^/(x). 

15)  In  algorithms,  iterates  are  typically  denoted  with  the  index,  k.  Greek  lower  case  letters,  such  as  a.  /i,  -v.  A, 
are  usually  positive  scalars.  Direction  vectors  are  usually  denoted  d,  and  the  algorithm  scaled  step  is 
usually  s.  Distinquished  points,  such  as  solution  or  accumulation  points  are  usually  denoted  by  a  star 
convention,  such  as  x*. 

16)  The  notation,  “tittle  o”  means  a  =  o{6)  4=>  j  — *  0  as  <5  — *  0. 

17)  A/fc  and  AM*  are  symbols  used  in  development  of  trust  region  algorithms,  where  at  each  iterate  k, 

■-  /(xfc)  -  f{xk  +  Sk)  and  AMk  :=  Aft(O)  -  Mk(sk). 


1  Introduction 


1.1  Types  of  Optimization  Problems 

Mathematical  optimization  methods  include  two  distinct  types,  the  combinatoric  and  the  continuous  meth¬ 
ods.  Unfortunately,  practicing  operations  researchers  often  face  problems  which  are  neither  purely  continuous 
nor  combinatoric.  For  these  problems,  one  ''annot  directly  use  the  strong  algorithms  for  continuous  optimiza¬ 
tion,  nor  the  discrete  methods.  One  such  hybrid  problem  is  the  assignment  problem,  a  useful  formulation 
for  cost  minimization  problems.  Assignment  problems  involve  assigning  jobs  to  servers,  where  one  wishes  to 
match  servers  and  jobs  based  on  minimizing  total  costs.  For  a  linear  assignment,  each  server  cost  function  is 
linear.  Realistic  assignment  problems  cannot  be  adequately  modeled  by  formulation  as  a  linear  assignment 
problem,  a  computationally  solvable  problem.  Practicing  operations  researchers  look  for  methods  allowing 
representation  of  non-linear  properties,  when  the  server  cost  functions  are  highly  non-linear.  For  example, 
piece-wise  linear  approximations  offer  greater  accuracy.  Another  typical  modeling  requirement  may  be  sat¬ 
isfied  by  approximation  using  a  cost  function  having  a  flat  minimum  charge  and  linear  rates  beyond  the 
minimum  charge.  The  added  accuracy  gained  by  allowing  piece-wise  linear  server  cost  functions,  ewn  just 
two  pieces,  is  an  attractive  capability.  The  assignment  problem  having  additively  separable  piece- wise  linear 
server  cost  functions  can  be  solved  by  a  strong  relaxation  into  a  linear  program  and  then  using  Gomory 
cutting  plane  methods  for  solution  of  {0, 1 }  assignment  variables.  However,  these  combinatoric  methods  are 
unsatisfactory  for  large  scale  problems.  Recent  work  addressing  improved  large  scale  methods  include  the 
work  of  Conn  and  Comuejols[1990),  where  the  problem  is  simplified  by  solving  for  the  unconstrained  dual 
using  a  projected  gradient  to  derive  search  directions.  Their  work  does  not  address  specific  methods  for 
finding  {0, 1}  primal  solutions  generated  by  a  dual  solution,  which  is  formidable  in  the  case  of  large  scale 
problems.  We  develop  a  method,  based  on  an  algorithm  for  finding  a  least  square  solution  of  a  system  of 
linear  inequalities  and  non-linear  equations,  along  with  using  common  Gomory  cutting  planes  to  solve  for 
the  {0, 1}  solutions. 

Recent  work  addressing  least  square  methods  for  solving  systems  of  inequalities  centers  on  the  research 
of  Han(1980),  Burke[1983],  and  Han  and  Burke  [1986).  Han(1980)  developed  an  algorithm  for  solving  systems 
of  linear  inequalities  using  a  search  direction  derived  as  the  projection  of  the  negative  of  the  residuals  onto  the 
subspace  spanned  by  the  coefficients  of  the  unmet  and  active  constraints.  Further,  he  showed  the  existence 
of  a  neighborhood  of  any  solution,  where  the  active  and  unmet  constraints  of  the  current  iterate  are  a  subset 
of  the  active  and  unmet  constraints  of  the  next  iterate.  This  neighborhood  is  now  called  an  identification 
neighborhood.  This  property  assured  finite  termination  of  his  algorithm. 

In  Burke[1983|  and  Burke  and  Han[1986|,  a  function  property  called  K-regular  is  defined,  which  guar¬ 
antees  a  well-defined  search  direction  at  each  iterate.  While  this  class  of  functions  may  seem  quite  general, 
Burke  and  Han[1986|  cite  “seemingly  well-behaved  situations”  which  are  excluded  from  their  K-regular  class 
of  functions.  Using  this  property,  they  extended  the  least  square  algorithm  for  solving  finite  dimensional 
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systems  of  inequalities  and  equalities  of  K-regular  functions  using  the  same  type  of  projection  as  for  the 
linear  case.  Burke[1983]  showed  sufficiency  conditions  for  global  convergence  of  this  “Gauss- Newton”  type 
search  direction,  as  well  as  the  steepest  descent  direction,  using  Armijo  line  search  conditions  to  solve  sys¬ 
tems  of  inequalities  of  uniformly  K-regular  C'  functions  in  a  least  square  sense.  For  the  non-linear  case, 
they  used  the  Mangasarian  Promowitz  constraint  qualification  condition,  defined  in  section  6,  to  assure  local 
q-quadratic  convergence  in  the  special  identification  neighborhood  of  a  stationary  point  having  zero  residual. 

This  dissertation  addresses  solving  finite  systems  of  inequalities  of  convex  functions  and  equalities 
of  functions,  in  a  least  square  sense.  The  advantage  of  first  examining  the  class  of  convex  functions 
is  that  the  strong  theoretical  properties  are  revealed,  suggesting  the  needed  properties  to  generalize  for 
the  non-convex  case.  Further,  for  the  assignment  application  subproblem  addressed  here,  the  inequalities 
satisfy  this  convex  condition.  This  dissertation  extends,  to  the  inequality  case,  the  traditional  methods 
used  to  solve  the  equality  case,  including  the  Gauss-Newton,  Levenberg-Marquardt,  Newton’s  methods,  and 
quasi-Newton  updates.  We  derive  new  adaptations  of  these  methods  to  handle  solving  inequalities  in  a  least 
square  sense.  We  show  the  sufficient  local  convergence  properties  to  assure  various  convergence  rates,  such 

as  q-quadratic,  or  q-superlinear,  or  q-linear  convergence.  Further,  we  analyze  and  adapt  the  other  major 
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global  convergence  method,  the  trust  region  method,  to  solve  systems  of  inequalities  in  a  least  square  sense. 
We  show  sufficient  properties  for  global  convergence  of  our  modified  trust  region  algorithm  to  handle  solving 
systems  of  inequalities  in  a  least  square  sense. 

Another  important  result  of  this  dissertation  is  that  we  have  unified  and  greatly  simplified  the  treatment 
of  the  essential  properties  which  govern  global  and  local  convergence  when  solving  systems  of  inequalities  in 
a  least  square  sense.  For  local  convergence,  the  key  property  of  the  least  square  objective  function  which 
unifies  the  underlying  theory,  regardless  of  the  algorithm,  is  the  special  approximating  property  that  the 
generalized  Clarke  differential  of  the  least  square  objective  function  possesses  in  the  special  identification 
neighborhood  of  a  designated  point.  This  special  approximating  property  assures  that  the  second  order 
Taylor  series  expansion  (based  on  generalized  differential  constructs)  about  each  point  i,  sufficiently  close 
to  a  given  point  x,,  models  the  objective  function  at  x,  perfectly  in  the  case  of  linear  inequalities  and  with 
second  order  accuracy  in  the  nonlinear  convex  case.  In  contrast,  a  “true”  analytic  property  would  also  assure 
that  the  expansion  about  x,  models  each  point  x.  The  key  global  property  is  that  the  gradient  of  the  least 
square  objective  function  for  inequalities  is  uniformly  Lipschitz  of  some  rank  7  on  the  lower  level  set  of  the 
first  iterate.  The  model  trust  region  method,  as  with  the  Armijo  line  search,  is  shown  to  converge  without 
any  added  conditions,  because  of  the  uniform  Lipschitzian  property  of  the  gradient  of  the  objective  function. 
With  our  results,  we  demonstrate  that  one  can  use  existing  algorithms,  with  minor  adaptations,  to  solve  for 
least  square  solutions  of  systems  of  convex  inequalities  and  non-linear  equations. 


2 


1.2  Overview  of  Thesis 

Section  1.3  in  the  Introduction  gives  background  material  on  generalized  differentiability,  as  in  Clarke[1989], 
as  well  as  other  results  needed  for  the  analysis  of  algorithm  convergence  properties.  These  properties  are 
the  foundation  of  the  dissertation  development.  Section  1 .4  gives  a  description  of  a  somewhat  remarkable, 
yet  simple,  algorithm  developed  by  S.P.  Han  to  solve  linear  systems  of  inequalities  in  a  least  square  sense. 
The  method  to  find  search  directions  in  Han’s  algorithm  uses  a  singular  value  decomposition  (SVD).  Using 
the  result  of  Bramley[1992j,  that  the  search  directions  generated  by  QR  factorization  with  column  pivoting 
are  bounded  by  the  minimum  norm  solution  provided  by  SVD,  it  follows  that  the  method  also  converges 
using  the  easier-to-compute  search  direction  using  a  QR  factorization  with  colunui  pivoting.  Since  the  QR 
factorization  with  column  pivoting  is  a  finite  computational  process,  and  SVD  is  a  convergent  iterative 
process,  this  change  allows  the  method  to  be  used  as  a  practical  algorithm.  The  ideas  in  this  algorithm  for 
linear  inequalities  are  a  motivation  to  consider  extension  to  other  common  minimization  algorithms. 

Section  1.5  in  the  Introduction  describes  background  on  algorithms  for  solving  systems  of  equations  in 
a  least  square  sense. 

Section  2  develops  the  assignment  problem,  starting  with  the  original  primal  problem  in  {0, 1 }  v^iables, 
strong  relaxation  to  a  linear  programming  primal,  conversion  to  the  relaxed  dual,  and  characterization  of 
relaxed  primal  solutions  from  a  dual  solution.  An  example  problem  is  given  in  section  2.4,  and  section  2.5 
discusses  methods  to  solve  for  {0, 1}  solutions  within  the  solution  set  generated  by  a  relaxed  dual  solution. 

The  treatment  of  least  square  solution  of  inequalities  and  equalities  is  done  in  sections  3  and  5.  In  section 
3.1,  we  develop  the  least  square  formulation  for  solving  systems  of  linear  inequalities  and  nonlinear  equations. 
In  section  3.2,  we  state  the  definition  of  stability  of  a  system  of  inequalities,  based  on  Robinson[l976l.  We 
show  that  the  least  square  formulation  for  solving  a  system  of  inequalities  is  numerically  stable  to  small 
perturbations  or  inaccuracies  in  computation,  using  this  definition  of  stability.  In  this  section,  we  give  an 
example  of  a  system  of  inequalities  which  is  not  stable,  according  to  Robinson’s  definition  of  stability;  but 
the  formulation  of  this  system  as  a  least  square  problem  yields  a  system  of  inequalities  which  is  stable. 

In  section  3.3,  the  identification  neighborhood  of  a  point  is  defined.  The  special  approximating  property 
of  the  generalized  differential  constructs  of  the  least  square  objective  function  is  defined  and  verified  for 
the  case  of  linear  inequalities.  In  this  case,  the  Taylor  series  based  on  second  order  generalized  differential 
construct  models  the  least  square  objective  function  perfectly  in  an  identification  neighborhood. 

In  section  4.1,  we  develop  the  Gauss-Newton  and  Levenberg-Marquardt  like  algorithms  and  show  local 
convergence  properties.  We  verify  global  convergence  properties  for  the  Armijo  line  search  and  develop  and 
analyze  global  trust  region  methods  using  a  generalized  Hessian  model  function  in  section  4.2.  In  section  4.3. 
we  analyze  the  least  square  objective  function  properties,  which  assure  that  the  iterates,  for  which  function 
decreases,  stay  bounded.  This  assures  the  existence  of  an  accumulation  point  among  the  iterates.  This 
function  property  is  related  to  coercive  behavior,  which  is  defined  in  this  section. 

In  section  5,  we  develop  the  case  of  systems  of  inequalities  of  convex  functions  and  nonlinear  equa- 
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tions.  In  this  section,  we  show  the  special  approximating  property  of  the  Taylor  series  based  on  generalized 
differential  second  order  constructs  of  the  least  square  objective  function  gives  second  order  approximation 
in  an  identification  neighborhood  of  a  point.  FVom  this,  in  section  6,  we  are  able  to  develop  and  prove 
local  convergence  for  the  Gauss-Newton  like  algorithm,  Levenberg-Marquardt  like  algorithm,  the  Newton¬ 
like  algorithm  and  the  special  quasi-Newto.i  symmetric  secant  update  algorithm  for  non-linear  least  square 
problems.  We  discuss  global  convergence  results,  noting  the  similarity  to  linear  case. 

In  section  7,  we  develop  and  verify  applications,  specifically  the  solution  of  the  assignment  problem 
of  section  2.  We  develop  and  analyze  a  method  to  solve  this  problem,  based  on  Han's  algorithm,  and  on 
the  algorithm  for  solving  systems  of  non-linear  equations  and  linear  inequalities  in  a  least  square  sense.  In 
section  7.2,  we  show  the  application  of  using  the  least  squares  method  to  solve  linear  programs,  specifically, 
the  relaxed  linear  program  of  the  assignment  problem.  Thus,  we  show  how  one  could  use  this  method  to  solve 
both  subproblems  of  the  assignment  problem.  In  section  8,  examples  of  the  Gauss-Newton  like  algorithm 
for  solving  the  sample  piece- wise  linear  assignment  problem  in  section  2  are  shown. 

In  section  9,  conclusions  and  recommendations  for  further  study  are  given. 

1.3  Background  on  Generalized  Differentials  " 

We  use  properties  of  the  Clarke[1980|  generalized  Jacobian  and  subgradient  in  this  research.  Given  a  function. 
F,  where  F  ;  J?"  >-*  3?'",  define  F  to  be  Lipschitz  of  rank  7  of  order  1  on  some  neighborhood  of  x,  if  for  all 
y  in  this  neighborhood, 

||F{x)-F(y)|l<7l|a:-y||. 

The  Clarke  subgradient  set  is  defined  for  finite  dimensional  functions,  which  is  convenient  for  calculations. 
Let  /  :  0?"  t— » 3?  be  Lipschitz  of  order  1  of  rank  7  in  a  neighborhood  of  x  .  Let  S  be  a  set  of  Lebesque  measure 
0  in  3?",  and  let  H  be  the  set  of  points  where  /  fails  to  be  differentiable.  By  Rademacher’s  Theorem,  has 
measure  0.  Then  the  Clarke  generalized  subgradient  of  /  at  x  is  the  set  d/(x),  where 

df{x)  =  convex  hull  (lim  V/(xi)  |  x;  — »  x,  x;  ^  5,  xj  ^  fi}. 

If  /  is  a  finite  convex  function,  then  the  Clarke  generalized  subgradient  set  is  the  same  as  the  convex 
subgradient  set  defin^  as: 

df(x)  :=  {x*\f{y)  -  /(x)  <  (x*,y  -  x),  for  all  y  e  3?"}. 

Now,  consider  F  :  3i"  >-«  3?"*,  and  assume  that  F  is  Lipschitz  of  rank  7  of  order  1  on  some  neighborhood  of 
X.  Let  Q  be  the  set  of  points  where  F  fails  to  be  differentiable.  Denote  the  Jacobian  of  F  at  xi  as  JF(xi). 
for  I;  ^  n,  where  I  e  /,  some  index  set.  Then  the  Clarke  generalized  Jacobian  of  F  at  i  is  the  set  dF{x), 
where 

dF{x)  ;=  convex  hull  {lim»7F(x()  | xj  — ‘  x,  X(  ^  D,  /  €  /}. 
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dF{x)  is  a  set  of  all  m  x  n  matrices  which  are  convex  combinations  of  a  finite  number  of  limits  of  a  sequence 
of  Jacobians,  JF[xi),  as  xj  approaches  x. 

Some  important  properties  of  the  generalized  subgradient  and  Jacobian: 

1 .  df{x)  is  a  nonempty  convex  compact  subset  of  SR". 

2.  dF{x)  is  a  nonempty  convex  compact  subset  of  8?'"”, 

3.  If  /  is  differentiable  at  x,  then  df(x)  =  Vfix). 

4.  If  F  is  differentiable  at  x,  then  df{x)  ~  JF{x). 

5.  dF  is  an  upper  semicontinuous  mapping  at  x.  That  is,  for  any  5  >  0,  there  is  a  e  >  0.  such  that  for 
y  €  N{x,  e),  a  neighborhood  of  x,  then: 

dF{y)  C  dF(x)  + 

where  Bmn  denotes  the  unit  ball  in  jR"*". 

6.  The  Vector  Mean  Value  Theorem  for  Generalized  Jacobian:  Let  F  be  Lipschitz  on  an  open  convex  set 
U  in  5R”,  and  let  u,v  €  U.  Then  one  has: 

F{v)  -  F{u)  6  convex  hull  dF{[u,v]){v  -  u), 
where  [u,  v\  represents  the  convex  hull  of  {u,  v}- 

7.  The  Jacobian  Chain  Rule:  Let  F  :  Oi”  >-*  fR*”  be  Lipschitz  in  a  neighborhood  of  x  and  G  :  -R"*  *  0?*  be 

Lipschitz  near  F(x).  Then  G  o  F  is  Lipschitz  in  some  neighborhood  of  x  and  for  any  p  €  3?",  one  has 

d(G  o  F(x))p  C  convex  hull  {dG(F{x))dF{x)p}, 
and,  if  G  is  strictly  differentiable,  one  has; 

5(GoF(x))p  =  VG(F(x))(9F(x)p. 

Note  that  dF{x)  is  a  set  of  m  x  n  matrices,  since  F(x)  is  a  map  from  t— ►  IR"*,  and  dG{F{x))  denotes 
the  generalized  Jacobian  of  G  with  respect  to  F(x),  so  that  dG{F{x))  is  a  set  of  x  m  matrices,  and.  thus, 
d[G  o  F(x))  is  a  set  of  /c  X  n  matrices. 

Some  notational  details  are  given  here.  In  our  discussions  we  compare  the  properties  of  Clarke  general¬ 
ized  differentiability  versus  the  common  notion  of  differentiability.  For  a  functional  /,  first  and  second  order 
generalized  differentiability  properties  are  discussed,  ’^or  a  function  F,  first  order  generalized  differentiability 
is  discussed.  For  ordinary  differentiable  constructs,  use  the  symbol  V  for  the  gradient,  the  symbol  for 
the  Hessian,  and  use  the  caligraphic  J  symbol  for  the  Jacobian.  We  reserve  the  use  of  the  notation  df  and 
OF  to  mean  th,.'  Clarke  generalized  subgradient  set  and  generalized  Jacobian  set,  respectively,  throughout 
the  dissertation  unless  otherwise  noted.  Suppose  /i  is  twice  differentiable.  Suppose  /2  is  once-differentiable 
and  V /2  is  Lipschitz  continuous  of  rank  7  on  some  neighborhood  of  x  which  assures  that  /2  is  second  order 
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generalized  differentiable  at  x.  Suppose  F\  is  differentiable  at  x,  and  F2  is  Lipschitz  continuous  of  rank  7 
on  some  neigliborhood  of  x,  thus  assuring  Fz  is  first  order  generalized  differentiable  at  x.  The  notation  for 
sunbols  useii  to  denote  differentiable  and  generalized  differentiable  constructs  is  summarized  in  Table  1, 
Notation  for  Dilferential  ('onstructs,  below: 


Order 


Diflferential  Construct  Generalized  Differential  Construct 


First  Order  (Jacobian]  JFx(x) 

Second  Order  (Hessian)  ^^/i(-r) 


J2(x)  e  dFzix) 
Hzix)  €  3V/2(x) 


Table  1.  Notation  for  Differential  Constructs 


W'e  prove  several  propositions  using  the  upper  semicontinuity  of  dV f,  where  /  is  a  convex  function. 


Proposition  1.3.1.  Seighborhouu  of  Invertib'dity.  •> 

Let  /  be  a  C'  convex  function  (which  implies  that  dVf{x)  exists  everywhere).  Let  x,  be  given  and  as¬ 
sume  any  generalized  Hessian  H{x,)  e  i^V/(i,)  is  invertible.  Then  0  =  sup{|l//“'(.T,)||  |  H{x,)  e  dV /(i, )} 
satisfies  J  <  -l-x.  Further,  there  exists  a  neighborhood  Af(x.,e)  of  i.,  such  that  if  x  N{x,,e),  then  any 
Mix)  €  dV/(x)  is  invertible  and  ||//“'(x)l|  <  23. 

Proof.  Since  the  generalized  Hessian  set  is  compact,  then  3  -  max{||//“’(x,)l|  |  H(x,)  €  dV/(x,)} 
IS  well-defined.  Using  the  upper  semicontinuity  of  the  generalized  Hessian  map  dV/,  choose  t  small  enough, 
.so  that  for  x  €  .V(x.,^)  and  any  H{x)  €  dV/(x),  then  some  H{x,)  satisfies: 

|l//(x)-//(x.)||  <  4- 


Then,  for  any  x  G  .V(x..e),  any  H{x)  €  dVf(x) 


\H~Hx.]{H(x)  -  Hix.m  <  ||//-'(x.)|!  \\H{x)  -  //(x.)ll  <  3- 


1  1 


23  2 

By  the  theorem  of  invertibility  of  perturbed  matrices,  the  matrix  H{x)  is  invertible  and  satisfies: 

l|W-‘(x.)|| 


W‘(x)||  < 


1  -  \\H~'{x.)(H{x)  -  H{x,) 


<  20. 


If  /  is  (3  convex,  using  the  definition  of  the  generalized  Hessian  and  the  property  that  is  symmetric, 
positive  semi-definite  for  convex  functions,  then  any  generalized  Hessian  of  a  C'  convex  functional  is  sym¬ 
metric,  pfjsiti  ve  senni-definite,  being  a  convex  combination  of  a  finite  number  of  limits  of  positive  semi-definite 
matrices  See  Hiriart-Urruty[1986|. 
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Proposition  1.3.2.  Neighborhood  of  Positive  Definiteness. 

Let  /  be  a  C'  convex  function.  Let  x,  be  given.  Assume  that  any  generalized  Hessian  H(x. )  G  dV/(x, ) 
is  positive  definite.  Then  there  exists  a  neighborhood  N(x,,t)  of  x,,  such  that  if  x  €  N{x,.t),  any  generalized 
Hessian  in  H{x)  G  dV f{x)  is  positive  definite. 

Proof.  Since  the  generalized  Hessian  set  is  compact,  one  can  solve  for  H{x,)  G  5V /(x.)  and  d  where 

IMII  =  1  : 

A  :=  min  {d.H(x,)d)>0 
subject  to:  ||d||  =  1 

H(x,)  G  5V/(x,)- 

Using  the  upper  semicontinuity  of  the  map  dV/(x,).  choose  e  >  0  small  enough  so  that  for  x  G  N{x..e). 
any  H{x)  e  dV/{x),  then  some  H{x,)  G  dV/(x,)  satisfies: 

;|//(x,)-//(x)||  < 

Then,  for  any  d  where  ||d||  =  1,  by  the  Cauchy-Schwartz  inequality, 

A  +  {d,  -H{x)d)  <  \{d,  {-H{x)  +  Hix,))d)\  <  ||//(x)  -  //(x.)ll  < 
which  implies  that  {d,  H{x)d)  >  |1 

1.4  Han’s  Algorithm  for  Solving  Linear  Inequ  Jities 

Han  (I980|  describes  a  method  to  solve  a  linear  system  of  inequalities  in  a  least  square  sense.  We  give  a  brief 
description  of  this  method,  and  develop  and  verify  properties  of  the  iterates. 

Let  A  be  a  linear  transformation  from  8?"  to  9?'”,  and  let  b  G  S’”.  Han’s  method  provides  a  solution  to 
a  system  of  inequalities,  Ax  <  b,  whether  feasible  or  infeasible,  by  solving  in  a  least  square  sense: 

min  fix)  :=  i((Ax  -  6)+,  (Ax  -  6)+),  (L4.1) 

where  (Ax  -  denotes  the  projection  of  Ax  —  6  onto  S’J’.  The  optimal  residual  z,  =  (Ax.  —  6)  +  ,  where  x, 
solves  ( 1 .4.1),  is  unique  and  satisfies  A^z,  =  0.  One  can  represent  ( 1 .4. 1 )  as  a  convex  quadratic  programming 
problem  with  linear  constraints  in  9?'”+": 

min  \{z,z} 

X,Z  £, 

subject  to:  (14.2) 

Ax  —  6  <  z 

Since  Ax  —  6  <  z  is  strictly  feasible,  and  since  the  objective  function  is  convex  and  quadratic,  there  exists  a 
solution.  The  Karusch-Kuhn-lhcker  (KKT)  conditions  for  a  solution  (z,x)  to  (1.4.2)  are: 

A'z  =  0, 

Ax  —  6  <  z, 
z  >  0, 

(z.  Ax  —  b  —  z)  =0. 
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Denote  I{x)  :=  {t|  {ai,x  —  bi)  >  0}.  In  most  cases,  the  dependence  of  x  is  evident  in  context,  so  for  simplicity 
of  notation,  I  =  I{x)  is  used.  Denote  Ai  as  the  submatrix  of  A  containing  rows  i  6  /.  Notice  that  if  /(x)  =  0, 
then  X  is  a  solution.  Given  an  iterate  x*,  let  I  =  I(xk)  and  define  the  search  direction  d*  as; 


dk.=  -A\{AiXk-bi),  (al) 

where  A\  is  the  psuedo- inverse.  A\  gives  the  unique  least  norm  least  square  projection  of  -{Aixk  -  b[)  onto 
the  subspace  spanned  by  ai,t  e  /  =  /(x*).  Han  verifies  that: 


A]Atdk  =  -A^{Axk-b)+  =  -V/{xfc). 


Calculate  the  next  iterate,  :=  Xk  +  t^dk  where  Ajt  solves  the  exact  line  search: 


min 

A 


-^fixk  +  Ad/fc)- 


(a2) 


Han  shows  finitely  terminating  global  convergence  to  a  global  (but  possibly,  non-unique)  minimum.  One  can 
express  the  algorithm  succinctly,  as  follows:  Let  the  iterate  x*  be  given.  Let  dk  be  calculated  as  the  unique 
least  norm  solution  of  least  square  solution  of; 


Afdk  = -(AiXk -bj).  (i>l) 

Let  Zk  :=  {AiXk  -  bj),  and  let  A*  >  0  be  calculated  so  that: 

{(zk  +  Afci4djfc)+,  Adk)  =  0.  (62) 

Because  the  line  search  function  is  piece-wise  quadratic  and  convex,  the  exact  minimizer  Xk  is  easily  found. 
Let  Xfc+i  :=  Xk  +  Afcdfc.  If  V/(xfc-i)  =  0,  then  x^+i  solves  (1.4.1);  otherwise,  iterate. 
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Properties 

VVe  show  that  the  iterates  satisfy  several  other  properties  in  the  following. 

1.  For  :=  +  XkAdic,  since  the  objective  function  is  convex,  the  subgradient  relationship  implies  the 

objective  function  decrease  satisfies; 

>  2Xk\\Ardk,A[dkf 

2-  {4^^k)-i^k+v^k+i}  =  {2k^zk)-{Zk-kU^k  +  \kAdk}  =  (Zk-K+i’^k),  by  property  (b2)  and  properties 
of  the  projection,  2^. 

3.  If  4  =  4+1,  then  (2^,2;^')  -  {zt+\>^k+\)  =  -^kA,dk),  by  computation. 

4.  If  Aidk  =  0,  then  Xk  is  a  solution,  that  is,  A^z^  =  0. 

5.  If  Ik  =  4+1,  then  (2^,2^)  -  =  0-  Letting  /  =  4  and  from  1  above,  given  that 

i^k^^k)  -  (4+i-4+i>  ^  2Xk\\Aidk,A,dkf, 


then 

(2/,  -XAfdk)  —  2{XAidk,  Aidk)  >0,  * 

(2/  +  XAidk,  -XAidk)  ~  {XAidk,  A,dk)  >  0,  (1.4.3) 

-{XAidk,  Aidk)  >  0, 

which  implies  that  Aidk  =  0.  This  means  that,  if  at  any  time  the  index  set  of  the  current  iterate  Ik  is 
the  same  as  the  index  set  of  the  next  iterate  4+i,  then  the  algorithm  converges  at  step  fc  +  1. 

6.  By  the  Projection  Theorem,  if  dk  solves  (al),  then  for  all  p  e 

{z[  +  A[dk,  A]p)  =  0, 

which  implies: 

(2/  +  Aidk.Ap)  =  0. 

7.  By  (bl)  and  (b2),  there  is  a  special  conjugacy  relationship  of  the  current  direction  dk,  and  the  new 
direction  dk+i'. 

{Adk,Al^^^dk+l)  =  0. 

Han’s  algorithm  has,  in  practice,  fast  convergence  to  a  solution,  whether  feasible  or  infeasible.  Han 
shows  that  after  a  finite  number  of  iterations,  the  iterates  satisfy  a  relationship: 

■4c4+i,  (1.4.4) 

and  from  (1.4.3),  this  implies  that  after  at  most  m  more  iterations,  then  the  algorithm  must  converge.  This 
special  neighborhood  of  a  solution  x,  is  called  an  identification  neighborhood,  in  that  unmet  and  active 
constraints  of  the  solution  are  a  superset  of  those  neighboring  points’  unmet  and  active  constraints. 
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1.5  Background  on  Least  Square  Algorithms  for  the  Smooth  Case 

In  addition  to  Han’s  method  for  solving  a  system  of  inequalities  in  a  least  square  sense,  there  are  other 
minimization  methods  available  for  solving  a  system  of  equations  in  a  least  square  sense,  a  smooth  problem. 
For  example,  if  one  wishes  to  solve  for 

F(x)  =  0, 

where  F  ;  3?"  3?”*  is  one  defines 

f(x):=~{F(x),F(x)}  (1.5.1) 

and  solves: 

inf  f(x). 

These  minimization  algorithms  depend  on  two  important  calculations  at  each  iterate  x,  deriving  a  search 
direction,  d,  and  a  step  size  A,  such  that  the  step  taken  is  Ad.  Most  minimization  algorithms  for  least  squares 
find  a  descent  direction  by  solving  Bd  =  — V/(x),  for  d,  where  V/(x)  is  the  gradient  of  the  current  iterate 
and  B  is  an  n  X  n  matrix  determined  by  the  algorithm.  Denote  J(x)  ;=  JF[x),  the  Jacobian  of  F  at  the 
current  iterate  x. 

1)  If  S  =  /,  where  I  is  the  identity  matrix,  then  d  is  the  steepest  descent  direction. 

2)  If  B  =  J^(x)  J(x),  then  d  is  the  Gauss-Newton  direction. 

3)  If  B  =  (x)J(x)  -I-  fil,  where  /i  >  0  is  chosen  to  assure  that  J^{x)J{x)  -i-  is  positive  definite,  then 

d  is  the  Levenberg-Marquardt  direction. 

4)  If  B  is  a  rank  1  or  rank  2  update  approximation  of  the  Hessian  of  /,  which  satisfies  the  quasi-Newton 
condition,  then  d  is  a  quasi-Newton  direction. 

5)  If  B  =  B(x),  the  Hessian  of  /  at  x,  then  d  is  the  Newton  direction. 

Notice  that  calculating  the  search  direction  requires  that  the  function  /  be  at  least  once  differentiable, 
as  in  the  case  of  the  steepest  descent  direction.  Newton’s  method  uses  second  order  information.  An 
important  property  of  B  is  that  it  be  positive  definite  to  assure  a  descent  direction.  Therefore,  one  must 
define  an  algorithm  which  accounts  for  this.  If  B  is  not  positive  definite,  it  may  not  be  invertible,  and  the 
Gauss-Newton  and  Newton’s  methods  can  be  generalized  by  solving  Bd  =  —g{x)  in  a  least  square  .sense. 

There  are  specialized  algorithms  to  solve  least  square  problems.  Using  second  order  methods,  such  as 
Newton’s  method  or  quasi-Newton  methods  to  solve  least  square  problems  do  not  take  advantage  of  the 
special  structure  of  the  least  square  objective  function.  Notice  that  the  Hessian  of  the  objective  function  can 
be  expressed:  V^/(x)  =  J^{x)J[x)  +  Q(x),  where 

m 

Qix)  :=  Yi  Fi{x)U{x), 

»=i 

where  ri(x)  is  the  Hessian  of  Fi(x).  In  cases  where  the  problem  has  a  zero  or  small  residual  solution,  Q(x) 
is  dominated  by  J(x)^J(x)  close  to  the  solution.  See  Varah  [1990]  and  Gill,  Murray  and  Wright  [1981].  A 
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small  residual  problem  is  characterized  by  properties  of  the  Jacobian  of  F  at  the  solution  and  the  magnitude 
of  the  residuals.  Because  of  the  special  structure  of  the  non-linear  least  squares  problem,  one  usually  uses 
the  Gauss-Newton  method  or  the  Levenberg-Marquardt  algorithm  for  zero  or  small  residual  problems. 

Newton’s  method  and  quasi-Newton  methods  are  also  applicable  for  solving  non-linear  least  squares, 
particularly  for  those  having  large  residuals.  In  section  4  we  adapt  these  methods  to  solve  systems  of 
inequalities,  by  selecting  a  generalized  Jacobian  or  Hessian  at  each  iterate. 

Inexact  line  search  methods  compare  the  predicted  step  size  and  the  decrease  in  function  value  versus 
the  actual  step  size  and  function  value.  For  example,  given  a, /?,  such  that  0  <  a  <  /3  <  1,  the  Armijo  line 
search  conditions  require  A  to  satisfy, 


f{x  +  Ad)  <  f{x)  -I-  a{g{x),  Ad), 
{5(x  +  Ad),Ad)  >  l3{g{x),Xd). 


(1.5.2) 


Conditions  on  the  function  /  which  assure  the  existence  of  A  to  satisfy  (1.5.2)  are  that  V/  be  continuous, 
that  the  search  direction  d  is  a  descent  direction  of  /  at  x,  and  that  {/(x  Ad)|  A  >  0}  is  bounded  below. 

The  model  trust  region  (or  restricted  step)  method  uses  a  model  function,  typically  a  second  order 
(Taylor  series)  approximation  of  /  at  x  to  compare  the  predicted  versus  actual  decrease  in  the  function, 
adjusting  the  step  size  appropriately.  Suppose  M{$)  :=  /(x)  +  {g(x),s)  +  ^{s,H{x)s),  where  g{x)  is  the 
gradient  of  /  at  x  and  H{x)  is  the  Hessian  of  /(x).  Then  the  Taylor  series  approximation  gives:  /(x  -f  s)  « 
M(s),  about  X.  The  method  finds  a  specific  step,  s,  to  satisfy  agreement  between  the  model  function  and 
the  function  /.  An  inital  6  >  0  is  chosen.  The  subproblem  below  is  solved  for  s: 


min  M(s)  subject  to  ||s||  <  6. 

Calculate  the  ratio: 

_  /(x)-/(x  +  s)  ^  /(x)-/(x  +  s) 

A/(0)  -  M{s)  fix)  -  M{s) 

1)  If  r  <  .25  ,  then  set  the  new  8  :—  ||sll/4. 

If  r  >  .75  ,  and  ||5||  =  8,  then  set  the  new  8  :=  28, 
otherwise,  set  the  new  8  :=  8. 

2)  If  r  <  0,  then  set  new  x  :=  x;  else,  set  new  x  :=  x  -1-  s. 

The  model  trust  region  method  may  not  take  a  step  at  each  iteration,  and  the  method  does  not  require 

that  the  Hessian  be  positive  definite.  The  choices  of  parameter,  above,  for  example,  .25,  and  .75,  are  based 
on  computational  experience.  Solving  subproblem  (1.5.3)  is  often  done  approximately. 

The  model  trust  region  method  requires  a  second  order  approximation  of  the  function.  In  section  4,  we 
show  that  using  the  generalized  Hessian  provides  this  approximation. 


(1.5.3) 

(1.5.4) 
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2  Piece-wise  Linear  Assignment  Problem 

In  this  section,  an  assignment  problem  having  a  piece-wise  linea;  objective  cost  function  is  described.  The 
piece-wise  linear  objective  function  of  use  here  is  the  maximum  of  a  constant  function  and  one  or  several 
linear  functions.  This  problem  is  the  motivation  and  an  application  of  algorithms  developed  in  sections  4 
and  6.  This  type  of  piece- wise  linear  objective  cost  function  is  useful  in  modeling  the  costs  of  assigning  jobs 
to  individual  servers,  each  of  which  have  fixed  minimum  charges  and  linear  rates  above  the  minimum.  The 
problem  is  expressed  as  a  {0, 1}  programming  problem  for  which  a  strong  relaxation  is  derived.  The  dual 
of  the  relaxed  primal  problem  is  derived,  as  well  as  optimality  conditions  for  the  relaxed  primal  solutions. 
Alternative  methods  for  solving  for  (0, 1 }  solutions  within  the  relaxed  problem  solution  set  are  described 
and  compared. 

This  assignment  problem  belongs  to  a  family  of  problems,  called  the  uncapacitated  facility  location 
problem  It  is  a  generalization  of  the  problem  described  in  Conn  and  Comuejols[1990). 

2.1  Statement  of  Problem 

Let  t  e  I  =  {1,2, . . . ,  m}  represent  jobs  to  be  assigned  to  servers  j  €  J  =  {1,2,3, ...  ,n}.  Let  1'"  e  9?'" 

m 

represent  the  unit  demand  vector  of  the  jobs.  For  each  j  e  d,  i  €  /,  let  y,  j  be  the  relative  amount  of  demand 
i,  assigned  to  server  j,  i.e.,  y  €  /T"".  The  variable  y  is  constrained  so  that  0  <  yi  j,  for  each  j  €  J,  i  €  I. 
Further,  the  variable  y  must  meet  the  demand  for  each  i.  That  is,  for  all  i  e  /, 

Let  positive  linear  cost  coefficients,  Cjj,  and  positive  minimum  cost  constants,  (pj,  be  associated  with  each 
server  j  and  job  i.  That  is,  assume  for  all  t  e  /  and  j  e  J,  that  c,j  >  0  and  (pj  >  0.  To  simplify  notation, 
interpret  the  vector  y,j  to  be  the  elements  of  y  €  9?"*”,  associated  with  server  j.  The  cost  function  for  each 
server  j  is: 

(  max{<pj,  (c.,j,  y.j))  if  any  yi^  >  0, 

\0  ifyi,j=0.  Vie/. 

This  server  cost  function  represents  a  flat  minimum  charge  with  linear  rates  beyond  the  minimum.  Some 
restaurants,  for  example,  charge  a  flat  table  charge,  even  if  the  sum  of  the  menu  items  fall  beneath  this  price. 
Figure  1  shows  the  graph  of  an  example  server  cost  function. 


Figure  1.  Example  Server  Cost  Function 
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One  can  form  the  strong  relaxations  of  the  cost  functions  by  introducing  variables,  1  >  >  0,  for 

j  £  J,  and  adding  the  constraints  that 

1  >  Wj  > 

The  relaxed  primal  problem  for  the  assignment  problem  is  then 


w,y  ■>  j  j 

3 


(2.1.1) 


subject  to:  e  I  E  Vi.j  —  T 
jiJ 


where  :=  m3ix{wj<f)j,  {c,j,y,j))  +  6{wj,y,j  |0  <  yi  j  <  Wj  <  l.Vi  e  I).  Here,  6  is  the  indicator 

function;  that  is, 


6(wj, 


0 

+00 


if  the  condition  is  satisfied, 
if  the  condition  is  not  satisfied. 


Notice  that  fj  is  convex,  being  the  pointwise  maximum  of  a  constant  function  and  a  linear  function.  By 
adding  a  variable  z  €  the  problem  can  be  expressed  in  standard  form  with  mn  +  2n  variables  and 
2mn  +  m  +  3n  constraints  as  follows:  * 


(P)  :  min  ^ 

j 

subject  to: 

Wj  -  yi,j  >  0, 

Vi,  j 

-■<t)jWj  +  >  0, 

Vj 

t 

Vj 

=  1. 

Vi 

3 

Vi.]  >  0, 

Vi,  j 

The  relaxed  primal  problem  (P)  is  feasible,  and  the  optimal  value  is  finite,  since  the  cost  coefficients  are  all 
positive  and  the  equality  constraint  on  y  assures  a  closed  bounded  feasible  region  for  the  y  variable.  Thus, 
by  the  Duality  Theorem  of  Linear  Programming,  there  exists  solutions  to  the  dual  of  the  relaxed  problem. 
See,  for  example,  Kolman  and  Beck  [1980]. 
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2.2  Dual  of  the  Relaxed  Problem 

Define  dual  variables,  i>  e  9?”*".  r,  s,t  e  3?",  and  ti  e  so  that  the  dual,  directly  derivable  from  (P)  is  : 


By  substituting 


(D)  :  max 

u,  -  y]  t, 

t 

j 

subject  to: 

=0, 

t 

Vj 

I'i  j  +  u,  —  Ci  jVj  <  0. 

Sj+rj  =  1, 

Vij  >  0, 

Vi,  j 

'J!*’ 

IV 

P 

Vji 

Sj  >  0, 

Vj 

Tj  >  0, 

Vj 

Ui  unconstrained,  Vi 

(2.2.1) 


since  i/j  j  >  0,  and  <t>j  >  0;  and  substituting,  rj  =  I  —  Sj  >  0.  one  can  eliminate  the  variables  r  and  s  to 
obtain: 

(D)  :  max 

i  j 

subject  to: 


^i,j  ~1"  Uj  C~i^j 


<  0,  Vi,y 
Vji 


(2.2.2) 


h  >  0,  V> 
Vi,j>0,  Vi,j 


Ui  unconstrained,  Vi. 

The  dual  has  mn  +  m  +  n  variables  and  2mn  +  2n  constraints.  Solving  the  dual  problem  may  be  preferable, 
if  there  are  fewer  variables  and  constraints.  Then,  after  solving  the  dual,  one  must  retrieve  a  solution. 
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2.3  Solving  for  the  Primal  from  the  Relaxed  Dual 


Since  (P)  and  (D)  are  linear  programming  duals  of  each  other,  and  if  u,  v,  t  (  and  implicitly  r  and  s)  solve 
(D),  then  there  exists  w,  y,  z  which  solve  (P).  The  optimality  conditions  which  mtrst  be  satisfied  can  be 
expressed,  for  optimal  u,  c,  t  and 


/<t>v 


rj  =  I-  Sj. 


Then,  from  (P),  it  follows; 


Vt.j 

>  0, 

'ii,j 

-<t>jWj  +  Zj 

>0, 

Vj 

>  0, 

Vj 

=  1, 

Vi 

i 

Wj 

<  1, 

Vj 

Vi.i 

>0, 

Vi,  j 

Further,  by  complementarity  conditions  for  (P)  and  (D),  the  following  must  be  satisfied; 


-  yi,j)  =  0,  Vi,j 
Sj(-<i>jWj  +  Zj)  =  0,  Vj 

i 

tj(wj -l)=0,  V> 
yi.j{-Vi,j  +  Uj  -  Cijrj)  =  0,  Vi,  j. 


Using  the  above  conditions,  one  can  express  the  solution  for  w,  y,  z  as  sl  solution  to  a  system  of  linear 
inequalities  with  respect  to  the  following  index  sets; 

01 'j  €  (0.  1)}  -!=>  Wj(pj  =  =  Zj 

i 

J~  •—  ^  Wj<t>j 

t 

J~  {jjrj  0}  >  ^  ^ 

X 

Ij  .  {i|  ^i,j  ^  ^  yi,j  ~  ^ 

IJ  :=  >  0}  j  =  Wj. 
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Solve  the  following  system  for  w,  y,  and  z:  (Note  that  z  is  an  unnecessary  variable.) 


y,.,  =0, 

yt,>  >  0,  Vi.j 

=  Vi 

j 

yij=rvj, 

<  1,  Vj 

Vi.j  <  Vi,y  (2.3.1 ) 


tj(w^  -  1)  =  0,  Vj 

~  Zj  —  Wj<pjj 

% 

Vj  e  J= 

^  ^ 

Vj  € 

^  ^  =  Zj, 

Vj  e 

2.4  Sample  Problem 

In  order  to  illustrate  the  primal  and  dual  relationship,  the  following  simple  problem  is  shown.  Let  m  =  4, 
n  =  3;  let  0  =  (26,30,30);  let 

/20  25  25  \ 

^  6  5  5  1 

^  40  40  44  I 

\30  30  22/ 

The  optimal  value  is  87.—  ,  achieved  at,  for  example,  f  =  (1,0, 1)  and  w;  =  (1,0.  I)  and: 


0  0  0  \ 

00  01’^ 

0  0  2.^/ 

si  =  S2  =  0,  and  S3  =  Thus,  rj  =  r2  =  1,  and  =  |y. 
optimality  conditions  (2.3.1). 

First,  consider  j  =  2,  where  i/i,2  =  0  for  all  i,  satisfying  /J  = 
consistent,  since  J-  =  {1,2}  and 


/  20  \ 

5 

\  V- 

40 

\,  t;- 

\22-^) 

\ 

/I  0 
0  0 


We  show  that  this  solution  satisfies  the 


{1,4},  and  =  0.  Further,  u;2  =  0  is 


(c.,2,  y*,2>  =  0  >  W2(i)2  =  0  *  30  =  0. 

Second,  consider  j  =  3.  /§  =  {1}  and  =  {2,4}.  It  must  be  that  rs  =  {y  £  (0, 1)  and  -  {3}.  It  must 
be  that  ^  =  1,  which  implies  that  1x73  =  1  =  j/2,3  To  satisfy. 


{c..3.y..3>  =  =  30, 
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then  y3,3  = 

Lastly,  consider  j  =  1,  where  =  {2,4}  and  /J"  =  0.  Since  1  €  J-.  then 


(c.,i-2/*.i>  57—  >  wi4>i  =26. 

J/1,1  =  1  <  w,  =  1;  yi,3  =  ^  <wi. 

For  i  =  3,  -  h  Further,  for  all  i,  —  h 

The  full  set  of  solutions  of  this  problem  include:  yi  j  =  0,  except  as  follows. 

yi.l  =  ICl  =  1.  t/2.3  =  J/4.3  = 

and  u)2, 2/3.2,  and  2/3,1  satisfy: 

1112  ^  l> 

6 

y3,i  > 

41 

2/3,2  "  “  ya.i’ 

„  4 

0  <  2/3.2  <  W2  <  jy3,2- 

See  Figure  2,  Example  Solution  Set  in  2  Dimensioa®,  W2  and  2/3,1- 


Figure  2.  Example  Solution  Set  in  2  Dimensions 
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2.5  Methods  to  Solve  Assignment  Problem 

The  attractiveness  of  the  strong  relaxation  of  the  facility  location  problem  is  seen  in  experimental  results. 
For  example,  Cornuejols  et  al.  [1989j  define  the  convex  hull  of  the  uncapacitated  facility  problem  solutions  as 
the  '"uncapacitated  facility  polytope” .  They  point  out  that  often  there  is  a  {0, 1 }  solution  within  the  relaxed 
solution  set;  and  if  not,  a  solution  can  be  found  after  a  small  number  of  cutting  plane  iterations.  Their 
experimental  results  show  a  small  difference  in  the  objective  function  values  between  the  two  problems,  the 
relaxed  and  {0,  1}  problem.  Thus,  most  methods  to  solve  this  problem  use  linear  programming  relaxations, 
although  there  are  great  variations  on  how  to  solve  these  subproblems.  Traditional,  primal  simplex  and 
cutting  plane  methods  are  used.  Conn  and  Cornuejols  |1990],  for  their  problem,  solve  the  dual  of  the  relaxed 
primal  using  projections.  If  solving  the  dual  is  more  efficient,  one  advantage  of  this  may  be  that  a  (perhaps, 
incomplete)  set  of  primal  solutions  are  characterized  through  the  complementary  slackness  conditions  of  each 
dual  solution.  It  is  desirable  to  find  an  efficient  method  to  search  the  set  of  primal  solutions  generated  from 
a  given  dual  solution. 

The  primal  solution  set  generated  by  a  given  dual  solution  is  a  polyhedral  convex  set  being  the  solution 
of  (2.3.1).  The  conditions  on  w  assure  that  w  is  bounded.  The  condition  on  the  y  variable,  j  =  1,  for 
all  i,  and  that  yij  >  0,  for  all  i,j  assure  that  y  is  bounded.  So  the  solution  set  of  the  relaxed  problem  is 
also  a  polytope.  Searching  for  {0, 1}  solutions  among  the  vertices  of  this  polytope,  via  direct  search,  could 
require  2"  iterations,  where  at  each  iteration,  a  solution  in  the  variable  y  must  be  computed  or  found  not  to 
exist. 

The  approach  proposed  in  this  dissertation  is  to  apply  a  penalty-type  method,  adding  equations  to  the 
system,  which  can  be  solved  using  continuous  methods.  As  an  example  of  a  penalty  for  {0, 1 }  conditions 
on  w,  one  may  choose  to  add  the  condition,  Wj(l  -  Wj)  =  0,  for  all  j.  This  non-linear  condition  has  the 
property  that,  if  met,  guarantees  a  {0,1}  solution  in  Wj.  However,  the  non-linear  property  demands  more 
powerful  solution  methods. 

One  can  express  the  system  of  equalities  and  inequalities  in  (2.3.1),  adding  the  condition,  vijj(l-Wj)  =  0 
in  more  convenient  notation.  Use  a  single  variable  x  to  represent  the  variable  (w,y),  noting  that  z  is  an 
aiixillary  variable  determined  by  the  variable  {w,  y).  Define  the  function  F  to  represent  the  nonlinear  coercive 
function  of  w.  For  example,  Fj  :  di  >-*  di,  and  Fj(wj)  :=  ~  Let  the  matrix  A  and  the  vector  b 

represent  the  linear  transformations  in  the  inequalities,  and  the  matrix  C  and  the  vector  d  represent  the 
linear  transformations  in  the  equalities.  Then,  solve  the  system  below,  finding  x  which  satisifies: 

F(x)  =  0, 

Ax<b,  (2..5.1) 

Cx  —  d. 

This  system  has  linear  inequalities  and  nonlinear  and  linear  equalities.  In  the  introduction,  Han’s  method  to 
solve  systems  of  linear  inequalities  in  a  least  square  sense  was  described.  One  approach  to  solve  (2.5.1 )  is  to 
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extend  Han’s  method  to  include  non-linear  equations.  Burke  [1984]  describes  one  algorithm  to  solve  the  linear 
inequalities  with  non-linear  equations  in  a  least  square  sense  The  search  direction  is  derived  by  projecting 
(least  norm  least  square)  the  negative  of  the  residuals  onto  the  space  spanned  by  the  coefficients  of  unmet 
inequality  constraints.  He  uses  an  Armijo  line  search  to  gain  global  convergence.  The  least  square  least  norm 
projection  uses  a  singular  value  decomposition  to  find  a  search  direction.  We  will  show  that  common  search 
direction  methods  are  applicable,  such  as  Gauss-Newton,  and  quasi-Newton  methods.  The.se  methods,  using 
direct  inverses  or  projections  using  QR  decomposition  with  column  pivoting,  are  finite  processes  for  finding 
a  search  direction  and  can  be  more  efficient  in  computation.  Further,  the  other  major  global  convergence 
technique,  the  trust  region  method,  is  examined,  considering  various  model  functions. 

It  is  not  advisable  to  use  a  penalty  function  having  a  large  penalty  parameter  or  a  barrier  function  to 
coerce  a  solution.  Large  penalty  parameters  are  known  to  give  ill-conditioned  subproblems  and  are  compu¬ 
tationally  impractical,  especially  for  this  problem,  since  solutions  will  necessarily  occur  at  the  boundary  of 
the  feasible  set. 

Using  Newton’s  method  to  directly  solve  the  system  of  equations; 

F(x)  =0, 

(Ax-6)*=0,  (2.5.2) 

Cx  =  d. 

is  difficult  due  to  the  non-differentiability  property  of  (Ax  —  b)+.  Advances  in  non-differentiable  optimization 
have  yielded  new  methods;  but,  to-date,  these  “bundle  ”  methods  have  suffered  fmm  poor  stability  and  rates 
of  convergence.  See  Lemarechal[1986|.  Recent  developments,  Schramm  and  Zowe|19911,  use  trvist-region 
features  to  increase  the  bundle  method  performance.  However,  the  special  structure  of  this  problem  suggests 
again  that  more  efficient  methods  can  be  found. 
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3  Properties  of  the  Least  Square  Formulation  for  Solving  a  System  of  Linear  Inequalities 

Prom  the  discussion  at  the  end  of  section  2.5,  a  more  promising  approach  to  solve  (2.5.1)  is  a  least  square 
approach  using  a  small  penalty  parameter,  such  as  1.  This  approach  has  two  strong  properties.  First, 
the  objective  function  is  differentiable.  Secondly,  we  show  in  this  section  that  the  least  ^.luares  problem 
with  small  penalty  parameter  is  numerically  stable  under  small  perturbation  due  to  inaccurate  computation. 
Fu'ther,  there  are  many  choices  of  algorithms  available  for  minimization  of  differentiable  functions,  several 
of  which  will  be  analyzed  and  adapted  for  use  in  this  problem.  The  system  is  solved  by  minimizing  the  norm 
of  the  residuals  of  the  inequalities  and  equalities. 

In  section  3.1,  we  define  the  least  square  formulation  for  solving  a  system  of  nonlinear  equations  and 
linear  equalities  and  in  section  3.2  develop  numerical  stability  properties  of  the  least  square  objective  function. 
In  .section  3.3,  a  special  neighborhood  of  a  designated  point,  called  an  identification  neighborhood  is  defined 
in  relation  to  a  finite  system  of  weak  linear  inequalities.  Points  in  an  identification  neighborhood  of  the 
designated  point  satisfy  the  same  strict  linear  inequalities  as  does  the  designated  point,  with  the  exception 
of  those  weak  linear  inequalities  which  the  designated  point  satisfies  with  equality.  This  means  that  the 
points  in  an  identification  neighborhood  are  on  the  same  side  of  all  separating  hyperplanes  (determined  by 
the  weak  linear  inequalities)  as  the  designated  point  except  for  those  hyperplanes  on  which  the  designated 
point  lies.  Further,  in  section  3.3.  generalized  differential  constructs,  first  order  and  second  order,  are  defined 
for  the  system  of  inequalities  and  the  least  square  objective  fur  tiou  Properties  of  Taylor  series  expansion 
based  on  generalized  differential  constructs  to  F  ally  approximate  the  least  square  objective  function  are 
analy-zed. 


3.1  Statement  of  Least  Square  Problem 

We  wish  to  find  r  €  /?”  which  solves  the  following  system  of  linear  inequalities  and  equalities  and  non-linear 
equalities. 


Fi(x)  =  0, 
Ax  <  6, 


(3.1.1) 


where 

F\  :  5?"  >— »  5?"*'  and  Fi  is  non-linetir, 

/I  :  3?"  >— *  0?’"^  and  A  is  linear. 

This  is  a  mixed  system  of  linear  inequalities  and  equalities  and  non-linear  equalities.  Notice  that  the  linear 
equation.*-  and  the  non-linear  equations  are  combined  into  a  single  multivariate  function,  Fi.  Theoretically, 
these  are  treated  together;  but  one  should  exploit  the  simple  linear  equations  properties  in  computational 
algorithms.  One  can  transform  this  problem  into  a  least  squares  minimization  problem.  Denote  aj  to  be 
the  ith  row  of  ,4  and  to  be  the  jth  element  of  the  ith  row  of  A.  Recall  that  (Ax  —  6)+  represents  the 
projection  of  Ax  -  h  onto  VVe  define  F  :  3?"  >— 


Fix) 


\,(/tx-6).y  “  1,F2(x)J- 


20 


Let  /  ;  3?”  I-*  3?  be  defined; 


fix)  ■=  fiix)  +  /2(x),  where 
fiix)  ;=^{F,(x),F,{x)), 

f2ix)  ■■=  -  ^)+.  {Ax  -  b)  +  ). 

Then,  instead  of  solving  (3.1.1),  find  x  which  solves 

inf  fix).  (3.1.2) 

Although  the  two  problems,  (3.1.1)  and  (3.1.2),  are  not  equivalent,  notice  that  if  x,  solves  inf  /(x)  in  (3.1.2), 
then  X,  solves  (3.1.11.  If  inf  fix)  >  0  in  (3.1.2),  then  (3.1.1)  has  no  solution.  The  least  square  algorithms  do 
not  guarantee  that  a  global  infimum  is  returned  to  solve  (3.1.2).  They  provide  a  local  minimum  stationary 
point  X,  of  (3.1.2)  which  may  or  may  not  be  a  solution  for  (3.1.2).  For  example,  if  /  is  not  convex,  there 
may  be  non-global,  local  minima.  /  given  by  (2.5.1)  is  c  convex.  Even  though  the  algorithm  may  return 
a  local  minim.um  stationary  point,  we  show  in  section  7  that  a  local  minimum  solution  of  (3.1.2)  provides 
important  information  for  solving  the  assignment  problem. 

3.2  Stability  of  the  Least  Square  Problem  for  Solving  Systems  of  Linear  Inequalities  • 

One  aspect  of  numerical  stability  that  is  related  to  solving  optimization  problems  is  the  sensitivity  of  the 
solution  to  the  problem  due  to  small  perturbations  or  inaccuracies  of  computation.  One,  intuitively,  would 
characterize  numerically  stable  problems  as  ones  which,  given  small  inaccuracies  in  computation,  would  give 
small  inaccuracies  in  the  solution.  Certain  problem  formulations  are  known  to  be  numerically  unstable, 
such  as  penalty  functions  having  large  penalty  parameters  and  barrier  functions,  which  have  solutions  on 
the  boundary  of  the  feasible  set.  In  this  section,  we  show  that  the  formulation  of  the  least  square  problem 
for  solving  systems  of  inequalities  is  numerically  stable  to  small  inaccuracies  in  computation.  See  Robinson 
[1975,1976]  where  he  develops  stability  theory  for  systems  of  inequalities,  both  the  linear  and  non-linear  case. 

Consider  the  linear  transformation  A  in  (3.1.1)  where  A  :  3?"  3?”*^,  a  right  hand  side  b  G  and 

the  variable  x  G  3?".  The  system  of  linear  inequalities  is; 

Ax<b.  (3.2.1) 

Define  Q  ;=  {xjAx  <-6}  as  the  solution  set  of  (3.2.1),  where  D  may  be  empty.  We  say  that  (3.2.1)  is  solvable 
if  0  is  nonempty.  Robinson  defines  the  nonempty  solution  set  Q  as  stable  if  for  each  xo  G  D  there  are  some 
positive  constants  /3  and  S,  such  that  for  any  linear  transformation  A'  ;  5R"  S?"",  and  b'  G  3?'",  where 

||A  -  A'll  +-  |li>  -f>'||  <  6,  the  distance  from  xg  to  the  solution  set  of  the  perturbed  system,  D'  ;=  {xjA'x  <  b’} 
satisfies 

dist  (xo,n')  <  0pixo),  (3.2.2) 


where 


Notice  the  p(i)  =  ||(i4'x  -  6')+||.  It  is  not  sufficient  for  the  system  (3.2.1)  to  be  solvable  to  guarantee  Q 
is  stable.  We  define  dist  to  be  infinite,  if  Q’  is  empty.  A  simple  example  of  an  unstable  solvable 

system,  in  a  one-dimensional  variable  x,  is; 

X  <  0, 

X  >  0. 

For  this  system,  $1  =  {0}.  Yet,  for  any  6  >  0,  the  perturbed  sytem, 


X 


has  empty  solution  set,  meaning  that  the  solution  set  is  unstable. 

Robinson  defines  the  system  (3.2.1)  to  be  regular  if  there  exists  some  x  such  that  Ax  <  b.  This  is  a 
strict  consistency  condition.  Robinson  shows  that  a  system  (3.2.1)  is  regular  if  and  only  if  the  solution  set 
n  is  stable. 

Rather  than  formulating  the  problem  as  in  (3.2.1),  one  can  represent  the  problem  as  a  least  square 
formulation  by  finding  x  e  SR"  and  z  €  SR”*^  which  solve;  " 


inf  Uz,z)) 

x,z  2 

subject  to:  Ax  —  b  <  z. 


(3.2.4) 


Recalling  the  KKT  conditions  for  this  problem, 

=  0, 

2  >  0, 

(3.2.5) 

Ax  —  6  —  2  <  0, 

(2,  Ax  —  6  —  2)  =  0. 


The  system  of  inequalities  represented  in  (3.2.5)  include; 


-2  <  0, 
Ax  —  b  —  z  <  0. 


(3.2.6) 


By  extending  the  original  problem  (3.2.1)  to  (3.2.4),  a  least  square  formulation,  the  system  of  linear  inequal¬ 
ities  (3.2.6)  is  regular,  and  hence  the  solution  set  is  stable.  This  is  assured  even  if  the  original  system  (3.2.1 ) 
is  not  solvable,  since  (3.2.6)  satisfies  a  strict  consistency  condition  for  any  choice  of  A  and  b. 
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3.3  Approximating  Properties  of  Taylor  Series  Using  Generalized  Differential  Constructs 

Assume  that  /i  is  twice  continuously  differentiable,  which  is  the  case  for  the  subproblem  (2.5.1)  obtained 
in  the  assignment  problem.  Denote  //i(x)  :=  V^/i(x)  to  be  the  Hessian  of  /;,  and  denote  J\{x)  :=  JFx[x) 
to  be  the  Jacobian  of  Fi  (x).  Refer  to  the  differential  notation  that  was  outlined  in  Table  1  in  section  1 .3.  This 
table  clarifies  the  distinction  in  notation  between  generalized  and  single-valued  differentials,  and  between 
differential  constructs  for  /  and  F. 


Proposition  3.3.1. 


{Ax  -  b)+  is  uniformly  Lipschitz  of  rank  [|i4||  of  order  1  on  all  of  •)?".  That  is,  for  all  x,  y,  |l(i4x  -  6)*  - 
(Ay-6)+l|  <P1|  ||x-yl|. 

Proof.  :  For  all  i. 


(0 


\{{ai,x}-bi)+-{{ai,y)-bi)  +  \  = 


|{ai,x-y)| 
j(ai,  A(x  -  y))| 
.  l(ai,A(x-y))| 


if  ({aj,x)  -  bi)+  =  {{ai,y)  -  bi)+  =  0, 
if  both  (ai,  x)  >  bi  and  (a*,  y)  >bi  , 
some  A  6  (0, 1)  ,  if  {ai,x)  <  bi  and  (ai,y}  >  bi, 
some  A  e  (0, 1),  if  (ai,x)  >  bi  and  {ai,y}  <  bi. 


(3.3.1) 


The  first  two  cases  are  clear.  The  third  case,  where  (ai,x)  <  bi  and  (a^,  y)  >  bi,  means  there  exists  A  e  (0,  1), 

m 

such  that  z  =  y  +  A(x  -  y)  and  (ai,  z)  =  bi.  Then 


|((ai,x)  -6i)+  -({ai,y)  -  bi)  + 
\{ai,z)  -  bi  -((oi.y)  -bi)\  = 
|(ai,2  -y)|  =  |(ai,A(x  -y))|. 


The  fourth  case  is  shown  similarly.  1| 

Thus  the  Clarke  generalized  Jacobian  set  of  (Ax  —  b)+,  denoted  d(Ax  —  b)+,  is  well-defined.  Let  Q  be 
the  set  of  points  where  the  Jacobian  of  (Ax  —  b)+  does  not  exist  and  denote  Jiiyi)  to  be  the  Jacobian  of 
(Ay*  —  6)  +  ,  where  the  Jacobian  does  exist.  Then  d{Ax  —  b)+  is  the  set  of  all  m  x  n  matrices  J^ix),  where 
any  J2{x)  is  a  convex  combination  of  a  finite  number  of  limits,  J2,  k  —  ],...,  K,  some  K.  Each  limit  J2  is 
obtained  as  the  limit  of  a  sequence  of  the  form  J2{yi),  where  yi  x  and  yt  ^  fi.  That  is. 


Jn  :=  lim  Jiivi),  where  yi 

»i— * 


Proposition  3.3.2.  ” 

J2(x)  is  a  generalized  Jacobian  of  (Ax  —  b)+  if  and  only  if 


Mhj) 


0  if  (oj.x)  -  bi  <0, 

sf  if  (o<,x) >  0, 


(3.3.2) 


XiO-l  if  (dj.x)  —bi  =  0,  for  some  Aj  e  [0, 1]  . 

Proof.  For  this  function  (Ax  —  b)^.,  fl  =  {i|  {ai,x)  =  bi,  for  some  i}.  By  direct  calculation  of  limits.  , 
one  finds. 

J2  =  EkA  =  lim  J^iyi),  where  yi  ^  il. 


yi-*x 
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where  £*  is  an  m2  x  m2  diagonal  matrix,  with  elements  e*  on  the  diagonal,  where 

{0  if  iel-{x), 

1  ifje/"-(x), 

0  or  1  if  i  €  /°(x). 

Then  J2{x)  &  d{Ax  —  b)^  means 


K  K 

^  XkEkA,  such  that,  ^  =  1,  e  [0, 1], 

<c=l  fc=I 

Therefore,  one  can  express  J2{x)  =  DA  where  D  is  a  diagonal  m2  x  m2  matrix,  with  elements  d,  on  the 


diagonal,  where 


fO  ifie/-(x), 
d.  =  ^  1  if  i  €  /+(x), 

^  A,  if  i  g  f°(x),  where  A*  g  [0, 1|. 


Note  that  for  all  i,p  g  3?",  that  ||  J2(a:)pl|  <  ||i4|]  ljp||,  for  all  possible  choices  of  J2{x)  g  d{Ax  -  b)^. 
Commonly,  di(x)  is  assumed  to  be  Lipschitz  continuous  of  rank  7  on  some  open  convex  set  D.  That  is 
||Ji(x)  —  */i(y)||  <  7||x  —  j/ll  for  all  x, y  g  D.  The  following  proposition  follows  from  real  analysis  and  is 
found  in  most  elementary  material  on  continuous  optimization,  including  Dennis  and  Schnabel  [1983ji 

Proposition  3.3.3. 

If  Ji  is  order  1  Lipschitz  continuous  of  rank  7  on  an  open  set  D,  then 

11Fi(x  +  p)-F,(x)-J,{x)p||  <2(|p||2, 

where  x  and  x  +  p  €  D. 

Proof.  Since  Ji  is  Lipschitz  continuous,  it  is  differentiable  a.e.  on  D,  and  the  line  integral  parameterized 
along  X  to  X  +  p  satisfies; 

l|Fi(x +p)  -  Fi(x)  -  Ji(x)p||  <  [  ||Ji(x  + (p)  -  di(x)||  IIpII  df 

do 

<  /  7i|fp||  IIpII  dt 

do 

=  ‘y\\p\?  f  idt 

do 

=  ^IIPIP. 


Observe  that  J2{x),  a  generalized  Jacobian  of  (Ax  —  b)+,  need  not  satisfy  a  Lipschitz  condition. 
Proposition  3.3.4.  Existence  of  an  Identification  Neighborhood. 

Let  X,  be  given  and  let 

/°(x.)  :=  {i|(o„x,)  =  6i}, 

F^{x.)  :=  {»|(ai,x.)  >  bi}, 

J~(x*)  :=  {t|  (ai,x,}  <  bi}. 
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Then  there  exists  a  neighborhood  N(x,,r)  of  x,,  such  that  for  x  e  Nix,,  r): 


/0(x)C/°(x.), 

/+(x,)  =  /+(x)\/0(x.),  (3.3.3) 

/'(x.)  =  /-(x)\/«(x.). 

Proof.  Removing  the  constraints,  i  <5  allows  forming  a  neighborhood,  since  there  are  then  all  strict 

inequalities.  Define  hyperplanes  for  i.  Hi  :=  {xj  (aj,i)  —6  =  0},  and  let  r  >  0  be  chosen  so  that, 

r  <  min{  dist  (x,,  Hi)\i  e  /"^(x,)  U  /“(x,)}. 


If  X  e  N{x,,r),  then 

/°(x)c/°(x.), 

/+(x.)  =  /+(x)\/°(x.), 

/-(x.)  =  /-(x)\/°(x.). 


Proposition  3.3.5.  Perfect  Approximating  Property 

Let  X,  be  given  and  let  N(x,,r)  be  a  neighborhood  which  satisfies  (3.3.3)  in  Proposition  3.3.4.  Let 
X  €  N{x,,r)  and  let  p  :=  x,  -  x,  then 

1.  for  any  J^ix)  €  d(Ax  -  b)+, 

(j4(x  +  p)  -  6)+  -  (Ax  -  b)+  -  J2(x)p  =  0; 

2.  for  any  J2(x,)  €  d{Ax,  -  b)+, 

||(i4x  -  6)+  -  (i4x.  -  6)+  -  d2(x,)(-p)||  <  ll>l/o(i.)||  IIpII; 

3.  for  each  particular  x  €  N{x,,r),  there  is  at  least  one  J2(x,)  €  d(Ax,  -  6)  +  ,  such  that 

(i4x  -  6)+  -  (Ax,  -  b)+  -  J2(x,)(-p)  =  0. 

Proof.  For  1,  note  that  x  +  p  =  x..  Consider  any  i: 

1.1.  If  a  I°(x,).  This  implies  i  i  /°(x).  If  a  G  l~(x)  U  /+(x),  then  ( Jjfx));  is  unique.  In  either  case,  then 

((x  +p,ai)  -  6j)+  -  ({x,  a,)  -  6i)+  -  (J2(^))iP  =  0- 

1.2.  If  a  €  /“(x,).  If  a  €  /“(x)  U  /+(x),  then  (J2(x))i  is  unique.  If  a  G  /°(x),  then  (p,  o^)  =  0,  so  for  any 
( J2(x))i  =  Atti,  where  A  e  [0, 1],  (p,  Ao*)  =  0.  In  any  case,  then 

({x  +p,ai)  -  bi)+  -  ((x,  Oi)  -  6j)+  -  ( J2(x))tP  =  0. 
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Relation  (2)  says  that  all  the  error  of  the  first  order  approximation  is  in  the  components  of  the  indices 
i  6  for  this  neighborhood. 

For  3,  Let  x  6  N{x,,  r),  and  let 

/+  :=  /+(x)n/0(i.), 

r  :=  rix)ni°ix,). 

The  only  choices  are  for  those  i  €  /'^(x,).  Choose 


(  a. 

for  i  e  /^, 

(^2(x.)).  =  {  0 

for  i  €  /“, 

[Xa, 

for  any  A  €  (0, 1],  i  6  /°(x, )  n  I'^{x) 

Then  J2{x*)  6  d{Ax,  —  6)+  and 

{Ax  -  b)+  ~  (i4(x,)  -  6)+  -  J2{x,){-p)  -  0. 


From  Proposition  3.3.5,  one  can  guarantee  a  type  of  perfect  first  order  local  approximation.  For  any  x 
in  this  Proposition  3.3.4  neighborhood  of  x,,  any  generalized  Jacobian  selection  of  d{Ax  -  6)+  can  be  used 
in  the  first  order  Taylor  series  to  perfectly  first  order  model  {Ax*  -  6)+  from  the  point  x.  On  the  other 
hand,  the  reverse  may  not  hold.  There  may  not  be  any  generalized  Jacobian  selection  of  d{Axt  —  6)+  for 
which  the  Taylor  series  based  on  this  generalized  Jacobian  perfectly  first  order  models  {Ax  -  b)+,  for  all  x 
from  this  point  x,.  But  for  each  x,  there  is  at  least  one  generalized  Jacobian  in  d{Ax,  -r  b)+,  for  which  the 
Taylor  series  using  this  generalized  Jacobian  models  {Ax  —  b)+  perfectly.  The  local  approximating  property 
of  Proposition  3.3.5.,  the  first  result,  is  the  critical  property  to  derive  properties  of  local  convergence  to  a 
stationary  point,  such  as  local  convergence  rates. 

This  neighborhood  is  called  the  identification  neighborhood  of  a  point  x. .  In  this  neighborhood,  a 
point  X  has  the  same  strict  inequalities  as  for  x,,  with  the  exception  of  the  indices  for  i  €  /°(a:.),  where 
a  containment  property  holds,  /°(x)  C  /°(x.).  Other  research  has  addressed  this  neigborhood.  Han[1980j, 
Burke[1983|,  and  Burke  and  More(1988]  use  this  property  in  analyzing  algorithms  to  solve  systems  of  linear 
inequalities.  They  state  an  equivalent  form  of  the  property  for  the  linear  case,  stating  that  the  directions 
stay  in  the  null  space  of  the  space  spanned  by  the  coefficients  of  the  active  constraints  of  the  stationary 
point  X,.  They  did  not  develop  this  property  in  the  setting  of  generalized  differential  constructs.  In  section 
5,  we  address  the  class  of  convex  functions  and  develop  the  local  approximation  results  for  this  class  of 
functions,  applying  the  approximation  result  in  a  unified  way  for  common  algorithms.  Robinson  (1990|  refers 
to  the  approximating  property  (3.3.5)  as  a  point-based  linearization  approximating  property,  from  which  he 
develops  local  convergence  properties  of  a  Newton-like  method  to  solve  for  zeroes  of  nonsmooth  functions. 
Robinson  does  not  show  the  details  of  how  to  form  the  point-based  approximation  for  specific  functions, 
rather  he  applies  this  approximating  property  to  anal)^e  general  convergence  properties. 

Denote  any  generalized  Jacobian  of  F  at  x,  denoted  J(x),  to  be 

J{x)  =  ^  ^  '  where  JjCx)  €  d{Ax  -  b)  +  . 
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/2,  being  a  differentiable  convex  function,  is  continuously  differentiable. 

Proposition  3.3.6. 

V /2  is  Lipschitz  continuous  of  rank  ||j4jp  of  order  1  over  all  3?". 

Proof.  For  all  x,  y, 

l|V/2(x)  -  V/2(y)||  =  WA'^iAx  -  b)^  -  A'^iAy  -  fe),|| 

<IM|| 

<  ll^lPt|x-y|!, 

where  the  last  inequality  comes  from  Proposition  3.3.1.  || 

Therefore,  one  can  also  derive  a  Clarke  generalized  Jacobian  set  of  V/2,  or  a  generalized  Hessian  set. 
denoted  3V/2(x).  Let  Q  be  the  set  of  points  at  which  V/2(x)  is  not  differentiable.  Then  dVf2{x)  is  the 
set  all  n  X  n  matrices,  H2(x),  where  H2{x)  is  a  convex  combination  of  a  finite  number  of  limits,  ,  where 
k  —  some  K.  Each  limit  //j  is  obtained  as  the  limit  of  a  sequence  of  the  form  V^/2(y()  for  some 

sequence  j/j  —  x  where  yi  ^  Cl.  That  is, 

Ho  :=  lim  V‘^f2iyi)  where  yi  ^  Cl. 
yi-*x 

Lemma  1. 

If  X,  is  given,  then  there  exists  a  neigborhood  N{x,,r)  of  x,,  such  that  for  x  6  N{x,,r)  and  any 
J2(x)  €  d{Ax  -  b)+  and  J2{x,)  €  3(zlx,  -  f»)+, 

{J2{x)  -  J2{x,))'^{Ax,  -  b)+  =  0. 


Proof.  Let  I'^{x,)  :=  {i|  (oj.x.)  -  6^  >  0}.  If  /‘•'(x.)  =  0,  then  the  result  follows,  so  consider  /'‘‘(x.)  ^  0. 
Let  Hi  :=  {x|  (uj,  x)  -  6,  =  0}  be  the  hyperplane  defined  by  a,,  &<.  Let  r  >  0  be  chosen  so  that, 

r  <  min{  dist  (x,,  Hi)\ i  e  I'^{x,)}. 


If  X  e  iV(x,,  r),  then 


/■^(x.)  C  /■^(x)  =  {t|  {oi,  x)  -bi  >  0}, 
and  for  indices  i  6  /■*■(!,),  ((ai,x)  -  h,)+  is  differentiable  and 


d{{a„x)  -  f>i)+  =  Oi. 


Then 


{J2{x)  -  J2{xtf{Ax,  -  b)+ =  ^  (oi  -  aj)(zlx,  -  6)i  =  0, 

since  for  i  ^  /‘'"(x,),  ((aj,x,)  -  bi)+.  =  0.  || 

Lemma  1  gives  a  neighborhood  which  is  a  superset  of  the  identification  neighborhood  about  x, ,  since 
only  the  indices  in  /^(x,)  are  used  to  define  the  neighborhood.  This  lemma  is  used  in  the  proof  of  Theorem 
1  in  Section  4. 
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Proposition  3.3.7. 

dVfiix)  =  {Jj {x)J2{x)\J2{x)  €  d(Ax  -  b)^}. 

Proof.  Using  the  definition  of  the  generalized  Hessian,  one  can  compute  Hk  obtained  as  the  limit  of  a 
sequence  of  the  form  where  yi  —>  x  and  yt  giving; 

Hk  ;=  lim  V'V2(yi}  where  yi^Q 

yi~*x 

=  lim  J[J2  (yMAyi  -  b)+  +  iyi)J7iyi)  where  yi  ^  Q 
=  lim  J2  {yi)J2{yi)  where  yi  ^  fi, 

yi-*x 

since,  for  i  e  I°{x)  U  /“(x),  limj„_i((a,,2/j)  -  6)+  =  0,  and  for  i  £  /^(x),  (J2(x)),  =  {J2{x)),  -  Oj,  a 
constant,  which  implies 

lim  J(J2^(yt))(^yi-6)+  =0. 
yi  -*x 

Therefore,  a  limit  Hk  is  expressed;  Hk  =  (EkA)^ EkA,  for  some  £*,  which  is  an  m2  x  m2  diagonal  matrbc 
with  elements  e*  on  the  diagonal,  and 


0  if  i  e  /  (x), 

ef  =  •{  1  ifi€/+(x), 

0  or  1  if  t  e  /°(x). 


(3.3.4) 


Note  that  El  =  Ek-  Then 

{EkAfEkA  =  (ElAfA  =  (EkAfA. 

First  show  that  dV /2(x)  C  {Jf  (x)J2(x)lJ2(x)  €  d(Ax-b)+}.  For  A*  satisfying  ^  ^nd  A*  e  |0,  Ij, 

then  H2(x)  €  dVf2{x)  means; 

K 

H2(x)  =  '^XkHk 


fc=i 

K 


=  Y,WkAfEkA 

k=l 

K 

=  {Y,XkEkAfA 

k=l 

K  K 

k=l  k=\ 

K  K 

=  {{^XkEk)^A)^C^XkEk)^A, 


fc=i 


since 


K 


K 


K 


Y^XkEk  =  (j;A*£*)i(^A*£*)i 


fc=i 


fc=i 


fc=i 


because  At  £  [0, 1],  and  Ek  is  diagonal  with  the  property  (3.3.4).  Since  ^k  =  1  and  Ajt  £  (0, 1),  then 


K 


{Y,^kEk)^A&d{Ax-b)+. 


*=i 
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Now,  to  show  containment  in  the  other  direction,  express  J2{x)  =  (J2k=i  ^kEk)^^,  solving  for  Ek  which 
are  diagonal  and  satisfy  (3.3.4)  and  A*  satisfying  =  1  and  \k  €  (0, 1].  This  is  easily  solved  for 

indices  i  €  /+(x)U  /~{x),  since  then  {J2(x))i  is  unique  (and  e*  is  unique)  and  any  choice  of  A*  6  [0, 1]  where 
^k  =  1  works.  For  i  e  /°(x),  one  must  solve  simultaneously  for  some  finite  K,  and  e*  =  0  or  1,  and 
Afc  €  (0, 1|  where  A*  =  1; 

K 

(J2(x))i={(Y,XkEk)^A)i.  (3.3.5) 

k=l 

Since  for  A^  e  (0,  1]  where  ^k  =  1  implies  ((J2k=t  ^kEk)^)i  €  [0, 1|,  and  since  the  positive  square  root 
function  is  a  bijection  on  [0, 1],  then  this  system  is  underdetermined,  implying  (3.3.5)  is  solvable.  Thus,  one 
would  reverse  the  arguments  in  the  other  direction,  given  any  generalized  Jacobian  J2(x)  e  d(Ax  —  b)^.  then 
some  H2(x)  e  dV/2(x)  exists  si  i  that  //2(x)  =  dJ(x)J2(x).  || 

Proposition  3.3.7  gives  one  convenient  way  to  choose  a  generalized  Hessian  of  /2(x)  using  any  selection 
of  J2(x)  e  d(Ax  —  b)+.  Note  that  Jj (x)J2(x)  is  symmetric,  positive  semi-definite.  Also,  for  any  x,p  6  9t" 
and  any  choice  of  J2{x)  €  d{Ax  -  6)  +  ,  it  follows  that  |(p,  Jj(x)J2{x)p)\  <  ||A|pl|p|p. 

Proposition  3.3.8. 

Let  X,  be  given  and  let  N{x,,r)  be  a  Proposition  3.3,4  identification  neighborhood.  Let  x  e  jV(x,,r) 
and  let  p  :=  x,  -  x.  Then 

1.  for  any  J2{x)  €  d(Ax  ~b)  +  , 

/2(x  +p)  -  Mx)  -  (p,  V/2(x))  -  ^(p,dj’ (x)J2(x)p)  =  0; 

2.  for  any  J2(x.)  e  d(Ax,  -  b)+, 

Ihix)  -  /2(x.)  -  {(-p),V/2(x.))  -  ^(p.  dj” (a;.)^2(x.)p)|  <  ^|(i4/o(^.)P,  (A/o(j,.)  +2A/+(j,.))p)|; 

3.  for  each  particular  x  e  N{x,,r)  there  is  at  least  one  J2(x.)  €  d(Ax,  -  6)  +  ,  such  that 

f2{x)  -f2(x.)  -  ((-P),V/2(X.))  -  (x,)d2(x.)p)  =  0. 

Proof.  For  1,  Proposition  3.3.5  means  that  F2(x  +  p)  =  F2(x)  +  J2[x)p.  Therefore,  taking  inner 
products: 

(f2(x  +  p),F2(x  +p))  =  (F2(x)  +  J2(x)p,  F2(x)  +  J2(x)p), 

/2(X  +  p)  =  /2(x)  -I-  (J2{x)p,  F2(x))  +  ^{p,jJ {x)J2{x)p). 

For  2,  this  follows  from  the  property  that  all  the  error  is  in  the  components  having  indices  t  €  /°(x, )  and  is 
all  contained  in  the  term:  ^{p,Jj(x,)J2(x,)p). 

1  rj. 

-(p.  {x.)J2{x.)p)  = 

2  *^||>(i.)P)  +  {Ai*(x,)P,  Ai+(x.)P)  +  2(J;o(j.^)P,  A/+(i.)p)  j  . 
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Therefore,  since  for  i  6  /°{x.).  iMx,))!  =  Kai,  for  some  Aj  e  (0. 1],  then 

l/2(x)  -  /2{x.)  -  {(-p),  V/2(x.)>  -  (x.)J2(x,)p)l  <  +  2.4/.  )p)  | . 

For  3,  choose  J2(x,)  as  in  Proposition  3.3.5.  Then  the  result  follows,  just  like  in  the  first  result  of  this 
proposition.  |1 

Denote  r,(x)  to  be  the  Hessian  matrix  of  (Fi(x))„  and  denote  ^(x)  as  the  gradient  of  /(x).  Denote 
ffi(x),  ff2(x)  =  A'^(Ax  -  b)  +  ,  the  gradients  of  /i(x)  and  /2{x),  respectively.  A  choice  H{x)  e  dV/(x).  a 
generalized  Hessian  of  /(x),  is  determined  by  the  choice  of  generalized  Jacobian  of  (Ax  -  6)*.  Then 

g{x)  =  J{x)^ F(x)  =  Jj Fx{x)  +  A'’'{Ax  -  6)+  =  £/i(x)  +  g^ix), 

H(x)  =  J{xfj{x)  +  Q(x)  =  Jx(xfMx)  +  MxfMx)  +  Q{x), 

ml 

where  Q{x)  ;  =  E  (Fi(x)).r.(x). 


I 

i 
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4  Generalization  of  Algorithms  for  Solving  a  System  of  Linear  Inequalities 

In  this  section,  we  develop  algorithms  to  solve  problem  (3. 1 .2) .  The  specific  algorithms  are  adaptations  of  the 
Gauss-Newton  and  Levenberg  Marquardt  methods,  based  on  using  a  selection  of  the  generalized  Jacobian  at 
each  iterate.  In  section  4.1,  the  Gauss-Newton  like  algorithm  is  defined  and  local  convergence  properties  are 
analyzed,  and  the  conditions  for  local  q-quadratic  convergence  are  verified.  The  Levenberg-, VI arquardt  t>q)e 
algorithm  is  developed  and  the  conditions  for  local  convergence  are  stated.  In  section  4.2.  global  convergence 
methods  are  discussed.  The  Armijo  line  search  algorithm  is  analyzed.  The  global  trust  region  method  is 
adapted  to  solve  (3.1.2)  using  a  generalized  Hessian  model  function,  and  the  conditions  for  convergence  are 
verified. 

4.1  Local  Convergence  Results 

The  properties  of  the  algorithm  along  with  the  properties  of  the  specific  function  being  minimized  guarantee 
local  convergence  in  some  neighborhood  of  the  solution  or  stationary  point.  We  show  local  convergence 
properties  for  the  undamped  search  directions,  the  solutions  d  to  Bd  =  -g(x),  where  g{x)  —  Vf{x)  and  B 
is  the  matrix  determined  by  the  specific  algorithm.  See  section  1.5. 

4.1.1.  Adaptation  of  Gauss-Newton  Directions:  Iterates  Xk  are  generated  by 

1**1  =  X*  4-  d* 


where  d*,  the  search  direction,  solves 


d*  d*d*  g{^Xk)  —  Jk  Bki 


where  J*  is  a  generalized  Jacobian  of  F  at  x*.  Local  convergence  of  the  Gauss-Newton  method  in  some 
neighborhood  of  the  solution  to  (3.1.2)  is  assured  under  the  conditions  of  Theorem  1.  Compare  to  Theorem 
10.2.1  of  Dennis  and  Schnabel  [1983]  for  the  smooth  case. 

Theorem  1.  Local  Convergence  of  Gauss-Newton  for  Linear  Inequalities 

Let  /  and  F  be  defined  as  in  section  3.1,  and  assume  the  following; 

1.  /i(x)  is  continuously  differentiable  in  an  open  subset,  D  C  3?”  and  Ji(x)  is  Lipschitz  continuous  on  D 
of  rank  7  and  order  1 . 

2.  There  exists  x,  e  D,  such  that  ^(x,)  =  0. 

Assume  there  exists  a  neighborhood  N(x,,r)  c  D,  such  that  the  following  are  satisfied: 

3.  ||Ji(x)||  <  a,  for  all  x  €  N{x,,r). 

4.  There  exists  a  >0,  such  that  for  ail  x  e  iV{x,,  r), 

||(Ji{x)  -  Ji(x,))'^Fi(x.)||  <crl|x-x.||.  (4. 1.1.1) 
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5.  A,  the  smallest  eigenvalue  of  ),  for  any  choice  of  generalized  Jacobian,  satisfies: 


A  >  cr. 


Then,  for  any  c  e  (1,  ^),  there  exists  c  >  0.  such  that  for  all  xo  6  jV(x,,  t),  the  Gauss- Newton  iterates 


Xfc  -  (J(zt)'^J(xfc))  ^J(xk)^F(xk), 


are  well-defined  and  converge  to  x,  and  satisfy: 

l|a;*:+i  -X,||  <  yl|xt  -x,||  -i-  (4. 1.1.2) 

11^*+ 1  “  II  ^  ll^fc  ~  II  <  II-  (4. 1 . 1 .3) 

Proof.  There  are  two  conditions  different  from  those  of  Dennis  and  Schnabel.  J2(x)  may  not  be  Lipschitz 
continuous  on  D,  and  Jj(x)  J2(x)  may  not  be  unique.  Lemma  1  and  Proposition  3.3.5  show  that  there  exists 
a  neigborhood  N(x,,r,)  of  x, ,  such  that  for  x  e  N{x,,r,)  and  for  any  choice  of  generalized  Jacobian  of 
FzW, 

iMx)  -  T2(ar.))^{2lx,  -  b)+  =  0, 

F2(x.)  -  F2(x)  -  J2{x)(x,  -  x)  =  0. 

This  implies  that  Jj{x){F2{x,)  -  F^ix)  —  J2(x){x,  -x))  =  0.  Choose  r  :=  min(r,r.). 

For  the  proof  of  the  theorem,  choose  some  fixed  c  €  (L^)-  By  an  argument  similar  to  Proposition  1.3.1, 
since  J^(x,  )J(x,)  is  invertible  for  any  choice  of  generalized  Jacobian,  there  is  a  neighborhood  of  invertibility, 
such  that  for  any  x  e  /V(x,,€i)  eind  for  any  choice  of  generalized  Jacobian, 


||(j’’(x)J(x))-i||  < 


since  c  >  1 .  Let 


Choose  Xo  €  N{x,,t)  and  then 


€  :=  min 


X  ~  CO 
ca-f 


j|{j^ixoy(xo))-'||<^. 


Using  induction,  consider  the  case  for  k  =  0.  Then  at  the  first  step,  xi  is  well  defined  and 

Xi  -  X.  =  Xo  -  X,  -  (j'^(xo)J(xo))“‘ J^(xo)F(io) 

=  ( J^(xo)T(xo))~‘[J^(xo)F(xo)  +  J^(xo)J(xo)(x,  -xo)| 

=  (J^{xo)J(xo))-'M^(xo)F(x.)  -  J^(xo)(F(x.)  -  F(xo)  -  J(xo){x.  -  xo))| 

-  (J^(xo)J(xo))-’[J^(xo)F(x.)  -  J,Vo)(F,(x.)  -  F,{xo)  -  J,(xo)(x.  -xo)) 
+  Jj (xo){F2(x.)  -  F2(xo)  -  J2(Xo){x,  -  Xq))]. 


(4.1. 1.4) 


(4. 1.1.5) 
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By  Proposition  3.3.3,  given  the  Lipschitz  condition  on  J\. 


\Fi{x,)  -  Fi(xo)  -  yi(xo)(x,  -  xo)j|  <  -||xo  -  x.j|^. 


(4. 1.1.6) 


Since  (x,)F{x,)  =  0,  and  by  Lemma  1,  Proposition  3.3.5,  and  condition  4,  then 


II  {xo)Fl(x,)  -  Jj (xo)(F2(x,)  -  F2(xo)  -  J2{xo){x,  -  Xo))!|  <  ol|x.  -  Xo||. 


(4. 1.1.7) 


By  condition  3,  that  ||Ji(xo)||  <  a  and  above,  it  follows  that 

l|xi  -  x.ll  <  ||(J^(xo)J(xo))-'ll  [|lJr(xo)F,(x.)  -  Jj {xo)(F2{x,)  -  FjIxq)  -  J2(xo)(x.  -xq) 

F  mi(a:o)l|  llFi(x,)  -  Fi(xo)  -  Ji(xo)(x,  -  xo)l|l 


< 


^  *”<7(1x0  -a:,l(  + 


which  proves  (4. 1.1.2)  for  k  —  0.  From  above  and  the  choice  of  e, 

\\xi  -  x.ll  <  ||xo  -  x.ll  y  +  -  X.ll] 

<  ||io  -  X, 

C<7  -1-  A, 


ca  A  —  cctI 
_  +  -yyj 


2A 

^  It^o  x#  ||, 


Ijxo  -x.| 


which  proves  (4. 1.1. 3)  for  A:  =  0.  The  induction  step  is  proven  in  similar  fashion,  with  substitution  of  A  +  1 
for  “1”  and  k  for  “0”.  || 

If  Fi  (x. )  =  0,  then  a  can  be  chosen  to  be  0,  and  the  convergence  rate  is  q-quadratic  in  this  neighborhood. 
<7  gives  a  measure  of  the  size  of  the  residual  of  Ft  at  x,.  If  F  is  linear  in  some  neighborhood  of  x,,  then  <7 
can  be  chosen  to  be  0,  since  in  this  neighborhood,  J(x)  —  J(x.)  =  0. 

4.1.2  Adaptation  of  Levenberg-Marquardt  Direction:  Given  a  generalized  Jacobian  Jk  of  F  at  each 
iterate  x*.  the  search  direction  dk  solves; 


(Jl  Jk  +  Hkl]dk  =  -g{xk)  =  -JkFk. 

where  nk  is  a  non-negative  scalar  which  is  cho.sen  so  that,  among  other  properties,  [J^ Jk  +  p/t/)  is  positive 
definite.  This  method  is  well-defined  even  when  Jk  does  not  have  full  rank.  There  is  a  sequence  of  iterates 
{xit},  and  there  is  a  sequence  of  non-negative  .scalars  {p*}.  There  are  different  techniques  to  choose  p*  so 
that  at  each  iteration.  {JjJk  +  p*/)  is  positive  definite.  One  technique  is  the  trust  region  method,  which 
is  discussed  in  section  4.2.  Theorem  2  below,  using  the  same  conditions  as  Theorem  1,  and  the  condition 
that  the  p*  are  bounded,  gives  local  convergence  properties  for  Levenberg-Marquardt  directions.  See  for 
example,  Dennis  and  Schnabel  (1983],  Theorem  10,2.6  for  the  smooth  case. 
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Theorem  2.  Local  Convergence  of  Levenberg  Xfarquardt  for  Linear  Inequalities 


Let  the  conditions  1-5  of  Theorem  1  be  satisfied.  Let  the  .sequence  of  non-negative  scalars,  {/Jfc},  be 
bounded  by  M  >0.  If  <7  <  A,  then  for  any  c  €  (1.  th^re  exists  e  >  0.  such  that  for  all  xo  €  jV(x,,  f ), 

the  Levenberg- Marquardt  iterates: 


xii .  I  :  ^  Xk  —  -^k  L  Hkl)  ^JkF'k- 


are  well  (.iefined  and  obev: 


Ikfc- 


:r.ll  < 


c(<T  +  M] 


Ik*  --r.il  + 


2(X  +  M) 


Xfc  -  X, 


c((T  -I-  A/)  +  A  -t-  M 

i|-r*.i  -  x.ll  s  - - Ik*  -x.ll  <  ||x*  -x.ll. 

Proof.  Choose  a  neighborhood  .V(x,,  r)  about  x,  as  in  Theorem  1.  Since  cr  <  A,  choose 


c  e 


/ 

V  kr  -1-  A  ] 


Then  c  >  1  Choose  such  that  Jo  is  invertible  and  satisfies  for  xq  €  N{x,,€\): 

||(J,f  Jo  +  Mo/)“‘ll  < 


Then  let 


e  :=  min  <  r,  ei. 


A  -t-  A/  -  c{a  -L  A/) 


CO  7 


The  proof  then  follows  the  same  pattern  as  Theorem  1. 
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4.2  Global  Convergence  Results 

There  are  two  basic  approaches  to  assure  that  iterates,  starting  from  any  initial  location,  converge  to  a 
stationary  point.  These  are  the  inexact  line  search  methods  and  the  trust  region  methods.  In  section  1..5, 
the  Armijo  line  search  properties  and  the  global  trust  region  method  are  described  for  the  smooth  case.  It  is 
usi'.ally  appropriate  to  apply  the  line  search  method  in  combination  with  steepest  descent  and  quasi-Newton 
search  direction  methods,  since  these  methods  produce  descent  directions,  directions  d  such  that  (d,  g{x))  <  0. 
Global  trust  region  methods  derive  the  step  (scaled  direction).  If  the  model  function  is  the  second  order 
Taylor  series  approximation,  then  one  might  consider  the  direction  to  be  a  Newton-like  direction,  in  that  it  is 
derived  using  Hessian  information.  If  the  model  function  uses  [x)J{x)  to  approximate  the  Hessian,  then 
direction  is  related  to  a  Gauss-Newton  direction.  If  the  model  function  uses  J^{x)J[x)  +  to  approximate 
the  Hessian,  then  the  direction  is  related  to  the  Levenberg-Marquardt  direction. 

4.2.1  Inexact  Line  Search;  We  analyze  properties  of  the  function  /,  given  by  (3.1.2).  necessary  for  the 
Armijo  line  search  to  assure  global  convergence  to  a  stationary  point.  See  the  Armijo  line  search  algorithm, 
(1.5.2).  Notice  that  the  function  /,  given  by  (3.1.2),  is  bounded  below  by  0.  Further,  V/2  is  uniformly 
Lipschitz  of  rank  ||A|p  on  all  5ft”  A  well  known  result,  due  to  Wolfe  [1969, 1971),  shows  the  conditions  for 
the  global  line  search  to  converge  to  a  stationary  point. 


Theorem  3.  Wolfe's  Global  Line  Search  Theorem.  Let  be  order  1  Lipschitz  continuous  on  {i|  f(x)  < 
/(xo)}  of  rank  7.  Suppose  an  algorithm  produces  Armyo  stepsizes,  A*,  with  descent  directions  d^.  satisfying 
for  all  k: 


(V/(xit),  Afcdfc)  <  0  or  d*=0. 


If  f  is  bounded  below  and  the  angle  between  d^  and  -V /(xfc)  is  uniformly  bounded  away  from  orthogonality 
then,  either: 

some  k  or  lim  V/(xife)=0. 

fc— *OC 

The  function  /,  given  by  (3.1.2),  satisfies  the  conditions  of  Wolfe’s  Theorem,  if  V/i  is  Lipschitz  contin¬ 
uous  on  {x|  /(x)  <  /(xo)}  of  some  given  rank,  say  71.  Therefore,  no  adjustment  is  needed  in  the  global  line 
search  method  to  account  for  the  generali.'ed  Jacobian  of  /2.  If  a  search  direction  method  is  chosen  which 
assures  descent  directions,  and  if  the  method  assures  that  the  directions  and  the  gradients  are  uniformly 
bounded  away  from  orthogonality,  then  the  Armijo  line  search  assures  global  convergence  to  a  stationary 
point.  Such  direction  methods  could  be  Levenberg-iMarquardt  or  a  positive  definite  quasi-.Newton  method. 

4.2.2  Global  Trust  Region  Method;  See  (1.5.3)  and  (1.5.4).  To  analyze  the  global  trust  region  method, 
one  considers  the  model  function  and  how  accurately  it  approximates  the  function,  /.  However,  higher 
accuracy  implies  more  difficult  computations  to  form  and  solve  subproblem  (1.5.3).  We  consider  two  model 
functions,  both  using  generalized  Jacobians  for  forming  the  model.  Suppose  H{x)  is  a  generalized  Hessian 


35 


of  /(i),  and  J{x)  is  a  generalized  Jacobian  of  f{x).  Let 

M{s)  :=/(x)  +  {g{x),  s)  +  //(x)s)  be  the  Hessian  (or  Newton)  model, 

I  r 

L(s)  :=/(x)  +  {g{x),  s)  +  -{s,  J  (x)J(x)s)  be  the  Gauss-Newton  model. 

Consider  the  generalized  Hessian  model  first.  The  method  finds  a  specific  step  s  to  satisfy  agreement  between 
the  model  function  and  the  function  /.  An  initial  >  0  and  ii  is  chosen.  Using  the  current  iterate  function 
value,  gradient,  and  the  generalized  Hessian  at  xk,  form  the  model  function  Mk{s).  The  subproblem  below 
is  solved  for  Sk  ■ 

min  M/t(s)  subject  to  lls||  <  6k-  (4.2.2. 1) 

S 

For  theoretical  convergence  results,  assume  that  the  subproblem  is  solved  exactly,  s*  gives  the  step  which 
minimizes  the  Hessian  model  (4.2.2. 1)  in  a  bounded  ball  about  x*.  How  does  the  reduction  in  the  /  compare 
with  the  reduction  in  the  model  function?  Calculate  the  ratio: 

f(xk)  -  fjxk  +  Sk)  _  /(Xfc)  -  /(Xfc  -I-  Sk)  14  2  2  21 

■  Mk{0)  -  Mk(s)  /(Xfc)  -  Mfc(sfc)  ■  ^  ^ 

Define  the  step  parameters  and  the  trust  region  size  parameter,  the  same  as  in  section  1.5,  denoting  the 
iterate  subscript  k.  Let 

A/fc  :=  /(xfc)  -  /(xfc  +  Sk), 

AMk  :=  Mfc(O)  -  Mfc(sfc). 

1)  If  Tfc  <  .25  ,  then  set  6fc+i  :=  ||sfc||/4. 

If  Tfc  >  .75  and  ||sfc||  =  6k,  then  set  5fc+i  :=  26k, 
otherwise,  set  5fc+i  :=  ^fc. 

2)  If  Cfc  <  0,  then  set  Xfc+i  :=  Xfc;  else,  set  next  Xfc+i  :=  Xfc  -|-  Sfc. 

Steps  which  reduce  the  objective  function  are  called  successful  steps.  The  others  are  called  null  steps. 
If  Tfc  <  .25,  then  the  step  is  called  a  region-reducing  step.  One  may  consider  choosing  a  different  generalized 
Hessian,  if  there  is  a  choice,  when  taking  a  null  step  or  a  region  reducing  step.  If  there  is  an  infinite  sequence 
of  iterates,  then  either  the  region  reducing  steps  have  an  infinite  subsequence  or  the  steps,  where  r  >  .25, 
have  an  infinite  subsequence  or  both.  Suppose  there  is  an  infinite  number  of  iterates.  If  6k  — ►  0,  then  an 
infinite  number  of  iterates  must  satisfy,  rk  <  .25,  in  order  that  6k  0,  meaning  that  there  must  be  an 
infinite  subsequence  of  region  reducing  steps.  If  there  is  an  infinite  number  of  iterates  and  inffc  6k  >  0,  then 
there  must  be  an  infinite  subsequence  of  iterates,  k,  where  Xfc  >  .25. 

A  first  order  condition  for  x»  to  solve  (3.1.2)  is  that  V /(x.)  =  0,  or,  equivalently,  for  all  p,  (V/(x,  ),p)  2: 
0.  A  point  X,  satisfies  second  order  minimality  conditions  if  for  every  generalized  Hessian  at  x,  and  every 
p,  (p,  H{x,)p)  >  0.  Define  “little  o”  notation,  o(*),  which  means  that  a  =  o{h)  <=^  ^  ♦  0,  as  /i  — >  0.  We 
use  this  notation  to  discuss  order  of  convergence,  such  as  linear  or  quadratic. 
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Proposition  4.2.2. 1. 


Let  5  be  a  bounded  set.  Let  /i  be  on  3?”.  and  V^/i  be  Lipschitz  continuous  of  rank  7  on  S.  Let 
^  ^nd  let  lisfcll  —  0  and  x/t  — *  a:*-  Then  there  exists  an  infinite  subsequence  of 

iterates  x^,  A:  €  £  satisfying, 

\f{x,+s,)-Mk{s,)\<l\\skf. 

Proof.  £  denotes  the  set  of  indices  of  the  subsequence.  Since  /i  is  C^,  and  its  Hessian  is  Lipschitz 
of  rank  7  on  5,  one  can  approximate  fi  second  order  with  the  first  three  terms  of  the  Taylor  series.  This 
property  is  easily  verified.  Given  7,  the  Lipschitz  constant  for  V^/i  on  5,  let  x.x  +  p  €  S,  then 

1/1  (x  +p)  -  /i(x)  -  {V/i(x),p)  -  -^(p,  VVi(a:)p>|  < 

l!  Uo  “  ’^Vi{x)||  lltpll  dt)  IIpII  dt  < 

lo  Uo 

For  the  proof  of  the  proposition,  since  Sk  — >  0,  let  Ki  be  sufficiently  large  such  that  for  some  k  >  K\, 
Proposition  3.1  assures  an  identification  neighborhood  N{xk  +  Sk,  rk)  about  Xn  +  Sk,  such  that  x*  €  N{xk  + 
Sk,rk)-  By  hypothesis  that  V^/i  is  Lipschitz  continuous  of  rank  7  on  5  and  the  above  result,  then 

1/1  (xfc  4-  Sk)  -  Mxk)  -  (V/i(x*),Sfe)  -  i(s*,  VVi(x)st)|  1  IWskf. 

By  Proposition  3.3.8,  forx*  €  Af(xfc+Sfc,rfc)  and  for  any  selection  of  generalized  Hessian  H-^ix)  =  Jj(x)J-2{x), 
then 

1  r 

fliXk  +  Sk)  -f2{Xk)  -  (Sfc,V/2(Xfc)>  -  -{Sk,J2  {xk)Mxk)sk)  =  0. 

This  implies  for  this  k  >  Ki,  that  the  part  of  the  model  for  /2,  denoted  M^,  satisfies: 

M^isk)  =  f2(xk  +  Sit)- 


Therefore, 

\f{xk  +  Sk)  -  Mk{sk)\  <  ^llsfcll^. 

Let  this  k  £  C. 

Now,  since  {xt}^j  is  an  infinite  sequence,  is  an  infinite  subsequence.  Since  ||s/t||  — >  0,  find 

K2  sufficiently  large  such  that  for  some  k  >  K2  >  Ki  one  can  again  form  an  identification  neighborhood  as 
above.  Repeat  this  process,  forming  an  infinite  subsequence  of  indices  fc  e  £.  || 

Proposition  4. 2.2.1  means  that  /(x*  4-  s*)  —  Mk(sk)  =  o(||sit|P)  for  k  €  £. 

We  state  and  prove  the  global  convergence  theorem,  which  is  a  generalization  of  Fletcher[1987]. 
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Theorem  4.  Generalized  Hessian  Model  Global  Trust  Region  Convergence 


Let  /  be  given  as  in  section  3.1.  For  the  trust  region  method  applied  to  the  generalized  Hessian  model, 
let  e  5,  where  S  is  bounded.  If  /i  is  on  5,  and  its  Hessian  is  Lipschitz  of  rank  7  on  S,  then 

there  exists  an  accumulation  point  x,  which  satisfies  first  and  second  order  optimality  conditions. 

Proof.  If  inf  Sk  =  0.  then  by  the  discussion  above,  there  must  be  an  infinite  subsequence  of  region 
reducing  steps,  and  hence,  steps  where  rk  <  .25.  Denote  £i  :=  <  .25},  where  inf  <5*  =  0.  If  inf  Sk  >  0, 

there  must  be  an  infinite  subsequence  of  iterates  such  that  >  .25.  Denote  £2  :=  >  .25).  At  least 

one  of  the  two  cases  occurs,  and  by  S  being  bounded,  the  infinite  subsequence  has  an  accumulation  point, 
which  is  denoted  x, .  Consider  the  two  cases: 

1.  Vk  <  .25,  6k  —>0  and,  hence  Hs^H  — *  0,  fc  e  £1, 

2.  rk  >  .25  and  inf  Sk  >  0,  k  €  £2. 

Case  1:  Let  g,  denote  V /(x*).  Argue  by  contradiction.  Suppose  there  is  a  descent  direction  p  at  x.,  with 
IIpII  =  1  such  that 

(p,5.)--a<0.  (4.2.2.3) 


By  Proposition  4.2.2. 1,  there  exists  an  infinite  subsequence  of  iterates  Xk,  k  e  £  C  £1,  such  that  for  fc  e  L 

f{xk  +  Sk)  =  Mkisk)  +  o(i|sfclp), 

(4.2.2.4) 

Afk  =  AMk-bo{\\skf). 

Consider  a  step  of  length  tk  =  ||sfc||  along  p.  Since  for  any  choice  of  generalized  Hessian,  Hk  is  uniformly 
bounded  on  5,  and  by  optimality  of  (4.2.2. 1),  it  follows: 

AMfc  >  MkiO)  -  Mk(ekP)  =  -{9k, ^kP)  -  {ckP,  HktkP) 

=  -{9k, ^kP)  +  o{tk) 

=  tkOc  +  o(efc), 

by  continuity  of  (•,•)  and  (4.2.2. 3).  Forming  the  quotient  rk,  since  CkO  +  o(ek)  <  £^Mk,  for  k  e  £,  then 

_  A/* 

■  AA/fc 
^  AMk  +  0(4) 

AMk 

=  1  +  ^ 

AM* 

Since  e/t  — ►  0  and  by  (4. 2.2.4),  then  r*  — ►  1,  which  contradicts  that  rk  <  25.  Thus,  it  must  be  that  p,  =  0. 

Again,  arguing  by  contradiction,  if  not  all  generalized  Hessians  at  x,  are  positive  semi-definite,  then 
there  must  be  at  least  one  second  order  descent  direction.  We  choose  a  specific  normalized  descent  direction 
d,  along  with  a  particular  generalized  Hessian  //.  at  x,,  for  which  {d,H»d)  <0  is  furthest  from  zero.  Let  d 
or  its  negative  be  a  descent  direction  with  l|d||  =  1,  and  the  generalized  Hessian  H,  at  x,  solve: 

mm  (p,  //(x,  )p) 


subject  to:  H(x,)  €  3V/(x,) 

IIpI!  =  1. 
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A  solution  pair  (d,  //,)  for  the  above  problem  exists,  because  of  the  compactness  of  the  generalized  Hessian 
set,  5V/(x.),  and  since  we  assume  at  least  one  such  direction  and  generalized  Hessian  exists,  by  our  argument 
by  contradiction.  Therefore,  for  the  solution  (d,  //, )  of  above,  (d,  H,d)  =  —a  for  some  a  >  0.  Then  for  all 
H(x,)  and  for  all  directions  p,  such  that  ||pll  =  1, 

(p,  H(x,)p)  >  (d,  //.d)  =  -Q  <  0.  (4.2,2..5) 

Notice  for  /:  e  £  C  £i,  for  any  other  generalized  Hessian  of  /2,  denoted  H'^,  and  its  associated  model 
function,  denoted  ,  at  x*  +  that 

+  Sk)  =  Mlisk)  =  M^'isk). 

For  k  £  C  C  £i,  choose  a  step  length  of  tk,  for  direction  ad,  where  a  =  ±1,  so  that  {gk,ad}  <  0. 
Notice  that  ad  also  solves  (4. 2. 2. 5).  Now,  by  the  upper-semicontinuity  of  the  generalized  Hessian  map, 
Hk  — »  //(x.)  €  3V/(x*).  Then  by  optimality  of  (4.2.2. 1),  and  by  the  upper  semi-continuity  of  the  generalized 
Hessian, 

AM/b  >  Mk(0)  -  Mk(tk(^d) 

>-yi{d,Hkd} 

=  -^-el{d,H(x,)d)  +  o(tl) 

>  ^4o‘  +  o(€l). 

Again,  one  can  form  the  quotient  ratio  r*,  as  above,  and  from  (4. 2.2.4),  then  r*  — ♦  1  which  contradicts  that 
Vk  <  .25.  Thus,  H,  is  positive  semidefinite,  and  any  choice  of  generalized  Hessian  at  x,  is  also  positive 
definite.  That  is,  for  any  direction  p,  such  that  ||p|(  =  1,  it  follows  that,  for  any  A  >  0 

0  <  (d,  H.d)  <  (p, //(x.)p) 

0  <  (Ad,  //.Ad)  <  (Ap,  //(x.)Ap). 

Case  2:  In  this  case,  there  is  a  subsequence  k  €  £2,  where  f\  -  f*  -  YltLiifk  -  fk+i  )  -  - 

.25  Ylk^Ci  ^^^k,  since  rk  >  -25  for  k  €  £2-  Since  /  is  bounded  on  5  and  x,  e  5,  then  /i  -  /,  is  uniformly 
bounded,  and  it  must  be  that  AAfk  — *  0.  Let  S  satisfy:  0  <  6  <  inf  6*.  Let  s  and  H, ,  a  particular  generalized 
Hessian  of  /  at  x,,  solve; 

min  {s,H(x,)s) 

subject  to:  //(x.)  6  9V/(x,)  (4. 2. 2.6) 

11*11  < 

(4. 2. 2. 6)  has  a  solution  since  the  generalized  Hessian  set  is  compact.  Define  M,(s)  ;=  /.  -I-  (s,p,)  +  5(5.  H,s). 
Define  x  ;=  x,  +  s.  For  sufficiently  large  k.  by  choice  of  6,  and  since  x.  is  a  cluster  point  of  {xfc|  fc  e  £2}, 
then 

l|i  -  -Tkll  <  Pll  +  ||xt  -  x.||  =<  6  -I-  ||x*  -  x.ll  <  6k. 
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This  implies  i  -x*  is  feasible  for  the  subproblem  (4.2.2. 1).  Thus, 


A/fc(x  -  Xfc)  >  Mfc(s/fc)  =  fk-  AA/ji, 

since  s*  solves  the  subproblem  (4. 2. 2.1).  In  the  limit,  fk  —  /.,  gk  —*  5.,  AA/fc  — *  0,  and  x  —  x^  — *  s.  Since 
//,  solves  (4. 2. 2. 6),  then  for  any  other  H(x,),  define 

A/'(s)  ;=  /,  +  {s,g,)  +  ^(s, //(x,)s)  >  A/,(s), 

and  then 

M:{s]  >  M.{s]  >f,  =  A/.(0)  -  A/.'(0). 

Thus,  s  =  0  also  minimizes  M,{s)  on  Ikll  <  6.  Since  the  constraints  are  not  active  at  s  =  0,  then  first  and 
second  order  optimality  conditions  must  be  satisfied  for  the  subproblem  (4.2.2. 1).  That  is,  g,  =  0.  and  H, 
must  be  positive  semi-definite.  Therefore,  on  ||s||  <  6  and  for  any  generalized  Hessian  H(x,)  at  x,,  then 

0  <  (s,  //,,  s)  <  {s,  H(x,)s}. 

This  implies  that  H{x,)  is  also  positive  semi-definite.  || 

If  the  lower  level  set  of  the  first  iterate  is  bounded  and  since  the  function  values  at  the  iterates  decrease 
or  stay  the  same,  then  he  existence  of  the  bounded  set  S  required  for  the  theorem  is  satisfied.  The  coercive 
properties  of  a  function  for  which  this  is  satisfied  are  discussed  in  section  4.3.  We  show  in  section  4.3,  that 
this  condition  is  satisfied  for  the  specific  function  /  determined  by  (2.5.1).  If  the  sequence  of  iterates  are 
finite,  that  is  AAf*  =  0  for  some  k,  then  first  and  second  order  optimality  conditions  are  satisfied,  by  an 
argument  similar  to  the  one  in  case  2  of  Theorem  4. 

We  now  consider  the  Levenberg-Marquardt  method  based  on  a  trust  region  method.  Let  x/t  €  3?"  and 
J{xk]  be  a  generalized  Jacobian  of  /  at  Xk-  The  generalized  Jacobian  model  function  for  the  non-linear  least 
squares  problem  is: 

1  T 

M{xk  -t-  Sk)  :=  /(xfc)  -I-  {Vf(xk),Sk)  +  {xk)J{xk)sk)- 

Let 

Bk  :=  J{xkfJ{xk). 

In  solving  for  the  model  trust  region,  one  wishes  to  find  the  direction  of  norm  less  than  or  equal  to  some 
6  >  0,  such  that  in  this  direction,  there  is  agreement  between  the  decrease  in  the  model  function  and  the 
decrease  in  /.  Find  s*  which  solves: 

min  f(xk)  + {Vf(xk),s)  +  ^{s,j'^{xk)J{xk)s)  , ,  r,  ^ 

^  ^  (4.2.2.() 

subject  to:  (s,  s)  <  . 

The  following  proposition  shows  the  conditions  for  optimality  and  uniqueness  of  a  solution  of  (4. 2. 2. 7).  This 
proposition  is  found  in  most  elementary  material.  See,  for  example,  Fletcher]  1987], 
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Proposition  4.2.2.2. 


Sfc  solves  (4. 2. 2. 7)  ■;=>  (sk,  s/t)  <  6“^  and  there  exists  /i  >  0,  such  that 


(fifc  +  n[]sk  =  -Vf{xk). 


Sfc)  -  6'^)  =  0. 


(4. 2.2. 8) 


Furthermore,  if  is  positive  definite,  then  is  the  unique  minimizer. 

Proof:  Since  Bk  is  positive  semi-definite,  (4. 2.2.7)  is  a  convex  problem.  (4. 2.2. 8)  are  the  KKT  condi¬ 
tions.  The  uniquenss  of  Sk  is  assured  by  the  strict  convexity  of  the  Lagrangian  of  (4. 2. 2. 7): 


(V/(xfc),s)  +  ^{s,{Bk  +  t^I)s). 


For  the  case,  where  n  is  chosen  so  that  Bk  -f-  nl  is  positive  definite,  there  is  a  unique  Sk-  Given  a  specific 
6,  there  is  an  associated  which  solves  (4.2.2.7).  Th'^  larger  the  6,  then  the  smaller  that  /x  is.  Thus,  the 
problem  becomes:  find  the  8,  as  large  as  possible,  while  still  assuring  Bk  I  fil  is  positive  definite.  The  trust 
region  algorithms  do  not  solve  for  ^  exactly.  Instead,  an  approximation  process  is  used,  where  6  is  decreased 
or  increased  using  the  trust  region  parameters  outlined  for  the  generalized  Hessian  model.  This  approach 
compares  the  changes  in  the  model  function  and  the  function  /,  and  adjusts  6  and  Sk  appropriately,  the 
same  as  for  the  generalized  Hessian  trust  region  method. 

By  choosing  J^ixk)  appropriately,  one  may  avoid  the  need  for  /x*  >  0  at  some  iterates.  For  the  function 
/  given  by  (3.1.2),  choose  a  specific  generalized  Jacobian  at  x*,  denoted  J*  where  for  indices  i  €  P{xk), 
choose  (J^),  =  a,.  If  for  a  direction  a,  such  that  {s,ai)  /  0  for  at  least  one  i  c  I°{xk),  then  for  any  other 
choice  of  generalized  Jacobian  J2{xk)> 

{J^S^J^S)  >  {J2(Xk}s,J2(Xk)s). 

The  model  function  based  on  the  Hessian  approximation  J^{xk)J(xk)  gives  a  positive  semi-definite  model. 
Thus,  in  some  cases,  this  choice  of  generalized  Jacobian  which  gives  an  increase  in  positive  definiteness  may 
be  sufficient  to  assure; 

(yi(x)s,  yi{x)s)  +  (j**, -^^s)  >0, 

compensating  for  lack  of  positive  definiteness  in  the  function  /i  at  x*.  This  strategy  of  choosing  the  most 
positive  definite  choice  of  /^(i*)J(x*)  among  generalized  Jacobians  of  (dx*  -  b)+.  increases  the  possibility 
of  avoiding  ^ik  >  0  for  some  iterates. 

If  is  highly  non-linear  near  a  stationary  point,  and  if  the  residual  is  large,  then  using  the  Hessian 
approximation,  may  not  work.  Varah(1990]  uses  statistical  arguments  to  compare  the  relative  sizes  of 
the  Hessian  terms  in  the  non-linear  least  square  problem.  He  reports  that  even  when  the  the  norm  of  the 
term  Q(x)  is  relatively  large  with  respect  to  the  norm  of  J^J,  the  algorithm  using  only  J  works  well. 
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4.3  Recession  Behavior  of  the  Least  Square  Problem  Formulation 


Notice  in  the  trust  region  algorithm,  that  the  existence  of  an  accumulation  point  of  the  sequence  of  iterates, 
Xk,  depends  on  whether  the  iterates  s*^ay  bounded.  Since  the  least  square  objective  function  /  is  bounded 
below,  we  would  like  to  characterize  a  property  of  /  w'hich  assures,  that  as  long  as  the  function  values  at  the 
iterates  significantly  decrease  (by  satisfying  the  line  search  conditions  or  the  trust  region  conditions),  that 
the  sequence  generates  an  accumulation  point.  One  condition  to  avoid  is  that  the  function  f{x)  recedes,  that 
is  does  not  increase,  as  l|x||  — ►  cx5.  This  property  of  a  function  is  called  recession  behavior.  As  an  example, 
a  function  such  as  ^  recedes  to  0  in  the  direction  d  =  1.  Given  xq  >  0  and  A  >  0, 


lim 

A— •TOO 


1 

xo  +  Ad 


=  0, 


and  ^  strictly  decreasing  as  A  — •  +oo.  Another  property,  contrasted  to  recession  behavior,  is  called 

coercive  behavior.  Define  a  function  as  coercive  if  for  all  x. 


lim  /(x)  =  +0O  as  ||x||  — »  oc. 


Recall  that  solving  for 

inf  i{(Ax-6)+,(Ax-6)+)  (4.3.1) 

is  equivalent  to  solving  the  quadratic  programming  problem: 

inf  (z,z},  (4.3.2) 

X.Z 

subject  to:  Ax  -  b  <  z, 

and  the  optimality  conditions  for  the  unique  z,  and  solution(s)  x  which  solve  (4.3.2)  are: 

A'^z,  =  0, 

Ax  <  b  +  Zt, 

>0, 

(z.,  i4x  —  b  —  z,)  =0. 

Denote  the  solution  set  of  (4.3.1)  as  D  :=  {x|  Ax  —  b  <  z»  and  (z,,  Ax-b-z,)  =  0}  /  0.  Let  /  :=  {t|  z*  >  0} 
and  I'  :=  (ij  z*  =  0}.  Denote  A/,  the  matrix  formed  from  the  rows  of  A  for  the  indices  t  €  /.  Then 

n  =  {x|  Ajx  =  bi  z*i  and  A/'X  —  bj'  <  0}.  (4.3.3) 

If  d  is  some  direction,  such  that  for  x  €  H  implies  x  +  Ad  e  D,  for  all  A  >  0,  then: 


A/(x  +  Ad)  =  bi  +  z}  and  A/'{x  +  Ad)  —  bf  <  0. 


Then  for  all  A  >  0: 


A/Xd  =  0/, 


(4.3.4) 
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(4.3.5) 


At' ^  0/' . 

(4.3.4)  implies  that  d  €  ^\f(Ai),  the  null  space  of  A/.  For  (4.3.5),  equivalently, 

a/'  >  0  implies  (d.  Af,a)  <  0. 


Let 

K  :=  {Aj.a\a>0r} 

be  the  cone  generated  by  Af,.  Denote  the  polar  of  A", 

K’  :=  {d I  (<i,c)  <0  for  all  c  e  Kj. 

One  can  express  the  condition  that  d  is  a  direction  of  recession  of  D,  as  follows; 

deAf(Ai)nK'.  (4.3.6) 

Recall  that  h{x)  :=  ^{(^x  —  6)  +  ,  (i4x  —  b)^). 

Proposition  4.3.1. 

If  I  ^  n  and  d  6  A/‘{Ai)r\K' ,  then  there  exists  Aq  >  0,  such  that  for  A  >  Aq,  then  /2(x+Ad)  =  /2(i+Aod) 
and  (V /2(x  +  Ad),  d)  =  0. 

Proof.  For  i  €  /,  it  follows  that  Ai{x  +  Ad)  =  A/x,  for  any  choice  of  A.  Let  ri>  Ard  <  0.  For  t  €  /', 
such  that  Tj  <  0,  choose  Aq  to  satisfy; 

(i  +  Aod,  Oi)  -  6i  <  0. 

For  A  >  Aq,  then  f^ix  +  Ad)  =  /2(x  +  Aod).  The  second  part  of  the  proposition,  that  for  A  >  Ao,  (V/2(x  + 
Ad),  d)  =  0  is  implied  by  the  property  that  /2(x  +  Ad)  =  /2(x  +  Aod).  || 

Recall  that  /i(x)  ;=  j(Fi(x),  Fi(x)).  Since  fi  is  bounded  below  by  0,  then  liminfj, /i(y)  >  0.  There 
may  be  directions  along  which  which  fi  recedes  to  some  0,  (3  >  0,  never  reaching  it.  Define  a  recession 
direction  d  of  f\  at  some  x,  if 


lim  inf /i(x  +  Ad)  =  lim  /i{x  +  Ad)<cx) 

A>0  A— *00 

The  functions  fx  and  /2  from  the  problem  formulated  from  the  assignment  problem  in  section  2  do  not 
have  a  common  direction  of  recession.  The  function  /i(x)  =  —  Wj)^  is  a  coercive  function.  /2  is 

also  coercive,  since  the  polyhedral  set  defined  by  (2.3.1)  is  a  polytope.  Both  f\  and  /2  are  bounded  below. 
Therefore,  if  steps  of  sufficient  size  are  taken,  with  sufficient  reduction  in  the  objective  function  to  satisfy  the 
Armijo  conditions,  and  the  angle  between  the  step  and  gradient  stays  bounded  away  from  orthogonal,  and 
if  there  are  an  infinite  number  of  iterates,  then  the  sequence  of  iterates  must  have  a  cluster  point.  Similarly, 
for  the  trust  region  conditions  to  satisfy  successful  steps  for  an  infinite  number  of  iterates,  there  must  be  a 
cluster  point  in  the  sequence  of  successful  iterates. 
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5  Properties  of  Least  Square  Problem  for  Solving  a  System  of  Convex  Inequalites 

One  can  extend  the  results  for  systems  of  linear  inequalities  to  more  general  functions.  This  section  defines 
the  least  square  formulation  for  solving  finite  systems  of  inequalities  and  develops  approximating  properties 
of  generalized  differential  constructs  in  modeling  of  convex  functions.  The  results  are  similar  to  those 
for  the  case  of  linear  inequalities  developed  in  section  3.  In  section  5.1,  the' least  square  problem  is  defined 
for  this  case  of  convex  inequalities,  and  the  numerical  stability  of  this  problem  is  discussed.  In  section 
5.2,  the  approximating  properties  are  developed.  The  existence  of  an  identification  neighborhood  is  verified. 
The  details  for  selecting  generalized  Jacobians  and  Hessians  are  addressed.  Using  the  identification  neigh¬ 
borhood  property,  several  approximating  results  are  developed  using  models  based  on  generalized  differential 
constructs.  In  the  case  of  a  finite  system  of  inequalities  of  non-linear  convex  functions,  the  generalized 
Hessian  model  of  the  least  square  objective  function  does  not  perfectly  approximate  in  the  identification 
neighborhood  of  a  designated  point  (See  Proposition  3.3.8).  However,  the  generalized  Hessian  model  doe*' 
provide  second  order  approximation  in  an  identification  neighborhood  of  a  designated  point.  Similar  results 
are  obtained  for  the  generalized  Jacobian  model  of  the  mapping  F+,  as  well  as  for  the  generalized  Hessian 
model  for  V/.  These  approximating  results  are  used  in  the  development  and  analysis  of  the  algorithms  in 
section  6. 

5.1  Statement  of  Problem 

Suppose  Fj  :  /P*  >-*/?,  i  =  1, ....  m  are  convex  functions,  which  are  C^.  We  wish  to  find  x  which  solves: 

F(x)  <  0, 


in  a  least  square  sense.  Define 

/(x)  :=i{F^(x),F+(x)). 

Notice  that  /  is  a  finite  valued  convex  function,  being  the  sum  of  convex  functions  (the  pointwise  maxima 
of  the  individual  convex  functions  and  the  constant  function  zero).  We  wish  to  find  x  which  solves: 

inf  fix).  (5.1.1) 

X 

This  problem  can  be  expressed  as  a  convex  programming  problem  in  JT*  x  /T” : 

inf  ]:{z,z)  (5.1.2) 

(i.i)  z 

subject  to:  F(x)  —  z  <  0. 

The  infimum  of  this  problem  (5.1.2)  exists,  since  the  constraints  are  feasible  and  the  objective  function  is 
bounded  below.  If  a  solution  (x,  z)  exists,  then  z  is  unique  since  the  constraints  are  convex  and  the  objective 
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fionction  is  strictly  convex  .  Let  J{x)  be  the  Jacobian  of  F{x).  Then  the  KKT  conditions  for  any  solution 
(x,  2)  of  (5.1.2)  are: 

J^{x)2  -  0  €  ft", 


2  >  0  €  ft”*, 
F{x)  -  z  <0  €  ft”*, 


(.5.1.3) 


(z,  F{x)  -  2)  =  0. 

In  section  3.2,  numerical  stability  (based  on  the  definition  of  Robinson [1975])  of  the  solution  set  of  the 
least  square  formulation  for  solving  systems  of  linear  inequalities  was  established.  Robinson(l976i  treats  the 
case  of  systems  of  inequalties  of  nonlinear  functions  and  shows  an  equivalence  between  regular  systems  and 
stable  solution  sets.  For  the  case  of  a  system  of  inequalities  of  convex  functions  defined  in  (5.1.3),  the 
formulation  as  a  least  square  problem  also  yields  a  system  of  inequalities  which  ^re  regular  and  hence  the 
solution  set  is  stable. 


5.2  Approximating  Properties  of  Taylor  Series  Using  Generalized  Differential  Constructs 

Since  we  assume  F  is  convex  and  CF  and  hence  finite,  then  F+{x)  is  Lipschitz  continuous  of  some  rank  7 
in  a  neighborhood  of  x,  and  then  there  is  a  well-defined  generalized  Jacobian  set  of  F+(x).  Since  the  gradient 
of  f(x)  exists  and  is  Lipschitz  continuous  on  lower  level  sets,  then  there  is  a  well-defined  generalized  Hessian 
set  of  /(x),  comparable  to  the  linear  case  in  Section  3.  We  use  the  same  notations  for  these  constructs  in 
this  section,  as  in  Section  3.  Denote  the  Jacobian  of  F  at  x  as  J{x).  Using  the  definition  of  the  generalized 
Jacobian,  similar  as  in  Section  3,  the  possible  choices  of  generalized  Jacobian  J{x)  of  F+(x)  are: 

{0  if  Fi(x)  <  0, 

J{iJ)  ifF<(x)>0, 

>^iJ(i,j)  if  Fi(x)  =  0,  where  Ai  e  [0, 1]. 

Proposition  5.2.1.  Existence  of  IndentUication  Neighborhood 

Let  X.  be  given  and  let 

/V.):=  {f|F.(x.)  =  0}, 
l^(x,)  :=  {tlFi(x,)  >  0}, 

I~ix.)  :=  {t|Fj(x.)  <  0}. 

Then  there  exists  a  neighborhood  N{x,,r)  of  x.,  such  that  for  x  e  /V(x,,r), 

/°(x)C  Ax.), 

/+(x.)  =  /+(x)\Ax.),  (5.2.1) 

/“(x.)  =  /"(i)\/”(x.). 

Proof.  Define  Cj  .=  {x|  Fi(x)  <  0},  and  note  that  since  each  Fi  is  convex  and  and  hence  finite,  then 
each  Ci  is  closed  and  convex.  If  t  e  /“(x,),  then  Fj(x,)  <  0,  and  since  each  Fj  is  continuous  there  exists 
a  neighborhood  /V(x,,rj)  about  x,,  such  that  for  x  €  N{x,,ri)  implies  Fi(x)  <0.  If  t  €  /+{x,),  then 
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Fi(x,)  >  0  and  x,  ^  Ci  which  is  a  closed  convex  set.  Thus,  there  exists  a  neighborhood  IV{x,.r,)  about  x., 
such  that  for  x  G  N{x,,  Xj)  implies  Fi{x)  >  0. 

Choose  X  >  0  so  that, 

X  <  min{xi}. 

If  X  €  7V(x,,x),  then 

I^{x)  C  /°(x.), 

I^(x.)  =  /^(x)\/°(x.)- 

/“(x.)  =  l~{x)\I°{x,). 

II 

Proposition  5.2. 1  shows  the  existence  of  an  identification  neighborhood  holds  for  the  convex  functions. 
In  the  linear  case,  one  can  perfectly  model  (/lx,  —6).^  from  points  in  an  identification  neighborhood  using  the 
first  order  Taylor  series  based  on  any  choice  of  generalized  Jacobian.  For  the  case  of  systems  of  inequalities 
of  convex  functions,  the  proposition  below  shows  that  the  generalized  Jacobian  model  attains  first  order 
approximation  of  F+(x,)  from  any  point  x  within  this  Proposition  5.2.1  neighborhood  of  a  point  x. 

Proposition  5.2.2.  First  Order  Approximation  Using  a  Generalized  Jacobian  Model 

Let  X,  be  given  and  let  N(x,,r)  be  an  identification  neighborhood  which  satisfies  (5.2.1)  in  Proposition 
5.2.1.  Let  X  €  N(x,,r),  and  let  p  :=  x,  -  x.  Since  each  Ft  is  convex,  denote  7  to  be  the  Lipschitz 
constant  of  JF  on  N{x,,r).  Then  the  following  holds: 

1.  For  any  J{x)  €  dF+(x), 

F^(x+p)-F+(x)-J(x)p  =  o(||p||), 

and  specifically, 

||F+(x  +p)  -  F+(x)  -  J(x)p||  <  ^IIpII^- 

2.  For  x  e  N(x,,r),  any  generalized  Jacobian  J(x,)  €  dF+(x,)  satisfies: 

tlF^x)  -  F+(x.)  -  J(x.)(-p)||  <  ||J;o(,.)||  IIpII  +  I||p||2. 

3  For  a  particular  x  €  N(xt,r),  there  is  at  least  one  generalized  Jacobian  J(x,)  e  dF-t-ix,),  which  satisfies 
for  this  particular  x: 

||F^(x)-F+(x.)-J(x,)(-p)|t<2||pf. 

Proof.  For  1,  since  N{xt,r)  is  bounded,  and  F  is  convex  and  C^,  and  7  is  the  given  Lipschitz  constant 
of  J{x),  then  the  Jacobian  of  F  on  IS'(x,,r)  satisfies,  by  Proposition  3.3.3, 

||F(x  +  p)-F(x)- J(x)p||  <  ^||p||2. 

Note  that  x  +  p  =  x, .  Consider  any  i: 


46 


1.1  If  i  ^  so  i  ^  f°{x)  and  thus,  i  g  /"(x)  u /"^(x).  In  these  cases,  (J(x)),  is  unique. 

1.2  If  i  e  /^(x.)  and  i  e  I~{x)  U  /^(x),  then,  as  above,  ( J{x)),  is  unique.  If  i  e  /°(x),  then  F^(x)  =  0  and 
since  F,  is  convex,  then  F,{x  +  ap)  <  0,  for  any  a  €  [0,  1|.  Since  F,(x,)  =  v  —  F,,.-  be  approximated 
first  order  about  x,  by  using  [J{x))i,  then  choosing  (J(x)),  =  A(J'(i))i,  for  any  A  e  [0, 1],  gives  •  as 
good  or  an  even  better  approximation.  Specifically,  from  the  Lipschitzian  property  of  ,7(x),  for  some 

||F,(x.)  -  F^(x)  -  iJ(x)),pl\  <  yllpll^. 

This  means  that 

ll(7’-s(x.)).  -  iF’^(x))i  -  A(J(x))ip|| 

=  l|-A(J(x)).p|| 

<  a|||p||^ 

<  |llPll^- 

In  either  case  1.1  or  1.2,  then 

||F+{x  +p)  -  F*(x)  -  J(x)p||  < 

For  2,  since  JF  is  Lipschitz  continuous  of  rank  7,  and  since  all  the  error  in  approximating  F+(x)  using  a 
first  order  approximation  at  x,  is  contained  in  the  components  with  indices  i  e  /°{x,),  then 

||F.(x)  -  FUx.}-J(x.)(-p)ll  <  IIpII  +  ^||p||2. 

For  3,  let  a  particular  x  €  N(x,,r)  be  given  and  let 

/°  :=  y^(x.)\/"(x), 

/+  :=  /^(x)n/V.), 

:=  /~(x)  n 

For  i  e  ,  choose  (J(x.))i  (»7(x.)),;  for  i  €  /“,  choose  {J(x,))i  :=  0;  and  for  i  e  choose  (J(x, ))j  :  = 

MJ{x,)]i,  where  A  g  [0,  Ij.  Then  this  particular  choice  of  T(x.)  g  9F+(x,)  satisfies  for  this  particular  x, 
by  1  and  2  above,  and  direct  calculation, 

||F.(x)-F.(x.)- J(x.){-p)||  <  l\\pf. 


The  gradient  of  /(x)  is  well-defined  on  R".  V/(x)  =  J^(x)F+(x),  for  any  choice  of  J(x)  g  dF^{x).  V / 
is  continuous,  being  the  gradient  of  a  differentiable  convex  function.  Since  /  is  convex,  bounded  below  and 
continuously  differentiable,  then  the  gradient  of  /  is  Lipschitz  continuous  (order  1 )  of  some  rank  7  on  lower 
level  sets.  If  Af  >  0  and  S  :=  {x|  f{x)  <  A/},  then  there  exists  7  >  0,  such  that  for  u,  v  g  S, 

||V/(u)  -  V/(r)l|  <  7||u  -  i'||,  {,'5.2.2) 
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I'htTt'fore.  the  Jerieralizeti  Hessian  set  of  V f(x].  denoted  dVf(x).  is  well-defined  on  S.  Selecting  a  generalized 
Hessian  fur  /  is  done  sinularlv  ;us  in  section  3  Let  I',  denote  the  Hessian  of  F, .  One  can  calculate  the 
generalized  Jai'ofiian  of  V  fix)  [x)F .  (x),  iLsing  the  definition.  Let  H  be  the  set  of  points  where  V/  is 

not  ditferentiable  t’alculating  any  limit  //*.■  as  xji  — >  x. 

fF  lini  where  yi  ^  H 

Vi  -z 

*  lirn  JiJiyi)'  )F,{yi)  f  J{yiV J{yi]  where  y;  ^  fi 

y. 

^  lim  V  r.iyHFlyi)  +  J(yi)^J(y;)  where  y;  ^  n, 

V;  —X 

i-  / '  1  yi ) 

where  /  '  y; ;  ^  {n  h\,yi  t  >  0}  Lsing  the  same  t^-pe  of  argument  in  Section  3,  one  can  show  the  choices  of 

generalized  Hessians  depend  only  on  the  choice  of  generalized  Jacobian,  J(x)  6  dF^[x), 

F- X .  •=  d:  sxiF.  ix) )  <=>  f/(x)  =  J^(x)  J(x)  +  ^  r,(x)F,(x)  for  some  J(x)  e  dF+(x),  (5.2.3) 

Prop<.»ition  3  3  8  assures  that  the  generalized  Hessian  model  (based  on  any  generalized  Hessian  at  x  ) 
attains  perfect  appro.ximation  of  the  linear  inequalities  least  square  function  at  x,  from  any  point  x  in  an 
identification  neighborhood  of  x..  Proposition  5.2.3,  below,  shows  that  the  generalized  Hessian  model  (based 
on  any  generalized  Hessian  at  x)  attains  second  order  approximation  of  /(x, ),  the  least  square  function  of 
inequalities  of  convex  C'^  functions,  from  any  point  x  in  an  identification  neighborhood  of  x, . 

Proposition  5.2.3.  Second  Order  Approximation  Using  a  Generalized  Hessian  Model 

Let  X.  fie  given  and  let  .V(x.,  r)  be  an  identification  neighborhood  which  satisfies  (5.2.1)  in  Proposition 
')  2  1  Since  F  is  F''^  convex  on  .V(x.,r),  let  i  be  the  order  1  Lipschitz  constant  for  V^(^(F(i),  F(x))).  Let 

x  ^  .V  X.,  r;  and  let  p  :  -  x,  -  x  Then  for  any  J{x)  e  dF^(x)  which  determines  some  H(x)  e  dVf{x), 

fix  f  p)  -  /(x)  -  (p,  V/(x))  +  ^{p.  H{x)p)  =  oIIpII^, 

or  specifically,  for  the  Lipschitz  constant,  of  V^(^(F(x),  F{x))): 

l/fx  4  p)  -  fix)  -  (p,  V/(x))  -  ^{p,H{x)p)\  <  ^IIpII^. 

Z  o 

Proof.  I  he  proof  is  similar  to  the  proof  of  Proposition  5.2.2.  Note  that  x  4- p  =  x, .  Let  7  be  an  order  1 
Iups<hitz  con.stant  of  V'^iF'Ffx).  Fix))  on  Nix,,  r).  which  exists  xince  i'V(x,,  r)  is  bounded  and  F  is  convex 
(  t'on.sider  any  i: 

!  If  I  i  /  x,  c  then  !  ^  /’(x)  If  !  6  /  lx)  U  I  'lx),  then  lJ(x)),  is  unique. 

2  If  !  ^  /  'x. ;  If  I  p  /  (xj  U  I '  (x),  then  (./2(x)),  is  unique.  If  i  €  l'^[x).  then  there  are  other  possible 

generalized  Jacobians.  ./(x)  dF^lx).  Notice  that  the  Hessian  term  cancels  out,  since  F,(x)  =  0.  Since 
N F'  X  *  p;.  F  X  i  pi;  can  be  approximate*!  .second  order  by  the  second  order  Taylor  series  about  x  using 


the  Hessian  model  with  V‘^^{F{x),  F(x)),  then  choosing  {J{x)}i  =  X{J{x))i,  for  any  A  €  [0, 1],  gives 
just  as  good  or  an  even  better  approximation,  as  in  the  proof  of  Proposition  5.2.2. 

In  either  case,  then 

\fix  +  p)  -  fix)  -  lj>,Vf(x))  -]^{p,H[x)p)\  < 


FYom  (5.2.3),  any  generalized  Hessian  H{x)  of  /  at  i  can  be  written  as 

H(x)  :=  j'^{x)J{x)  +  Qix) 

where  Qix)  ;  =  ^  ri(x)Fi(x).  (5-2.4) 

Lemma  2. 

Let  X,  be  given.  Then  there  is  some  <7  >  0  and  there  is  a  neighborhood  A^(x,,r)  of  x,,  such  that  for 
X  e  Af{x,,r),  given  any  J(x)  6  dF+(x)  and  any  J(x,)  €  5F^.(x,), 

IK  J(x)  -  J(x,))'^F+(x.)|l  <  a\\x  -  x.||. 

Further,  there  is  some  /?  >  0  such  that  for  any  x  e  N{x,,r),  any  J(x)  €  9F+(x),  and  any  J(x,)  e  9F+(x,), 

IK  J(x)  -  Jix,))'^ F+{x,)  -  Qix.)(x  -  x.)||  <  p\\x  -  x.lK. 

Proof.  For  the  first  part,  let  /'^(x.)  :=  {e| (F(x,)),-  >  0}.  If  F(x.)  <  0,  then  the  lemma  is  satisfied,  so 
consider  the  case  where  /''■(x,)  ^  0.  Form  a  neighborhood,  where  r  >  0  is  chosen  so  that  if  x  e  jV(x,,r), 
then 

/+(x.)C/^(x)  ;={i|(F(x))i>0}. 

For  indices  i  e  /■*'(x,),  then  (J(x))j  is  unique  and  (y(x,))i  is  unique.  Since  F  is  convex  C^,  then  JFix)  is 
Lipschitz  continuous,  of  some  rank  7  of  order  1  on  N{x,,r).  Then  one  can  approximate: 

IKJ(x)  -  J(x.))^p^(x.)ll  =  II  X!  -  Ji(a:.))F,(x.)||  <  7||f’+(x.)ll  l|a:  -  x.||  <cr||x-x,||, 

ier+(x.) 

for  any  a  >  7||F+(x.)||. 

For  the  second  part,  since  F  is  convex,  let  7g  be  the  order  1  Lipschitz  constant  of  Qix)  on  N(x.,r). 
This  implies  there  exist  7*  for  i  €  /■*'(x,)  and  x  e  IV(x,,r),  such  that 

IK  J(x)  -  J{x,))i  -  r,(x.)(x  -  x.)|l  <  7i||a:  -  x,f. 

It  follows  for  any  x  e  N(x,,r),  any  J{x)  e  dF+ix),  J(x,)  e  dF+(x,),  then 

IKJ(x)  -  y(x.)^F+(x.)  -  Q(x.)(x  -x,)||  =  Ij  X  l^tMiWi^)  -  •fix>))i  -  ri(x.)(x  -  x.Jll 

</?||x-x.||^. 
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since  for  t  €  /'''(x,),  {J{x))i  is  unique  and  (J(x,))i  is  unique.  || 

Lemma  3,  below,  shows  that  the  first  order  Taylor  series  of  V/  (about  a  point  x  in  an  identification 
neighborhood  of  a  point  x,)  based  on  a  generalized  Hessian  of  dV /(x)  attains  first  order  approximation.  The 
lemma  also  shows  for  each  particular  point  x,  the  existence  of  at  least  one  generalized  Hessian  H*  €  dV /(x. ) 
for  which  the  Taylor  series  about  x,  models  V/(x)  first  order.  The  notation  //*  €  dV/(x.)  is  used  since 
the  Taylor  series  about  x,  using  this  particular  generalized  Hessian  H*  models  V /(x). 

Lemma  3.  First  Order  Approximation  of  V /. 

Let  X,  be  given.  Let  N(x,,r)  be  a  Proposition  5.2.1  identification  neighborhood  of  x. .  Since  F  is 
convex,  let  7  be  an  order  1  Lipschitz  constant  of  V^(|(F{x),  F(x)))  on  N{x,,r).  Theii  for  x  e  N{x,.r)  and 
for  any  J{x)  €  dF+(x)  which  determines  some  H(x)  6  dV /(x), 

||V/(x,)  -  V/(x)  -  H(x){x,  -  x)||  =  ol|x,  -  x||, 

or  specifically,  for  the  Lipschitz  constant  7: 

l|V/(x.)  -  V/(x)  -  H{x){x,  -  x)||  <  l\\x,  -  x||2. 

Additionally,  for  each  particular  x  6  /V(x.,r),  there  exists  a  generalized  Hessian,  H*  6  dV/(x,),  such  that 

llV/{x)  -  V/(x.)  -  Hlix  -  x.)ll  <  l\\x  -  x.||2. 

Proof.  Let  p  =  x,  -x.  For  the  first  conclusion,  use  the  same  argument  as  for  Proposition  5.2.2.  Since  F  is 
convex  on  N(x,,r),  then  V^^(F(x),  F(x))  is  Lipschitz  continuous  (order  1)  of  some  given  rank  7.  Then 

||J(x+pfF(x+p)- J(x)^F(x)- J^(x)J(x)p-^Fi(x)r,(x)p||  <  2||p||2. 

i 

Now,  for  each  i  consider  the  same  cases  as  in  Proposition  5.2.2.  The  only  case  where  there  is  not  a  unique 
choice  of  generalized  Jacobian,  J{x)  €  dF+{x),  is  for  i  €  /°(x,)n/°(x),  where  again,  any  choice  of  generalized 
Jacobian  gives  a  even  better  or  just  as  good  an  approximation,  as  in  Proposition  5.2.2.  Therefore,  the  first 
conclusion  follows. 

The  second  result  uses  the  same  argument  as  in  Proposition  5.2.2.  || 

An  approximating  property  of  Lipschitz  continuous  Jacobians  follows: 

Proposition  5.2.4. 

Let  G  :  W'  0?"'  be  C'.  Let  x,  be  given,  and  let  the  Jacobian  of  G,  denoted  JG,  be  Lipschitz 
continuous  of  order  7  on  some  neighborhood  N(x,,r).  Let  u,v  £  N{x.,r).  Then, 

|;G(t;)  -  G(u}  -  J{x»)(v-u)\\  <  ^(||v  -i.ll  +  ||u  -  x.||)||i'  -u||. 


50 


Proof. 


||G(t;)-G(u)-J(x.)(t^- 1^)11  = 


J{u  +  t(v  -  u)}  dt'j  (v  -  u)  -  J'(x.)(v  -  u)|| 

<  f  l|J(w  +  i(u  -  w))  -  J(x.)ll  (it  ||(v  -  u)|| 

Jo 

^  f  7||(i  - t)M  +  - x.y (it iKv  - u)|| 

Jo 

=  [  7ll(l  -  0(w  -  X*)  +  t{v  -  X.)l|  dt\\{v-  w)|| 

Jo 

“^(/  ll(i^’-2:.)||j  i|(t’ -u) 

=  ^(lit’-^*ll  +  l|w-^.||)ll’^-’i||- 


The  result  below  shows  one  more  approximating  property,  relating  other  points  in  the  identification 
neighborhood  of  a  point. 

Lemma  4. 

Let  x«  be  given.  Let  N{x,,r)  be  a  Proposition  5.2.1  identification  neighborhood  of  x,,  and  let  u.v  e 
Ar(x,,r).  Then  there  is  some  <r  >  0,  such  that  any  J{v)  e  dF(v),  any  J(u)  €  dF{u)  and  Q{xt)  = 
X;/+(*.)ri(x.)F<(x.)  satisfy: 

\\{Jyv)  -  J(u)fF+(x,)  -  Q{x,){v  -  u)||  <  (7(||t;  -  x.|l  +  ||u  -  x.||)||t;  -  u|| 

Proof.  Let  iV(x,,r)  be  an  identification  neighborhood  of  x,.  Let  7j  be  the  order  1  Lipschitz  constant  of  Pj 
on  N{xt,r],  and  let  u,  v  €  N{x,,r)  and  J{v)  €  dF(y),  J{u)  €  dF{u).  Then 

||(J(v)  -  J{u)VF+{x*)  -  Q{x,)(v  -  u)||  =  II  Y.  KMiiAv)  -  J[u)),  -  r.(x.)(r  -  u))||. 

i€/+(i.) 

Since  I'^(x,)  C  and  /'•'(x.)  c  /'*'(«),  then  if  i  e  I'^{x,),  then  t  e  /“''(v)  and  i  e  and 

{J(v))i  =  (J{v))i,  [J{u))i  =  (J{u))i.  By  Proposition  5.2.4  above,  for  each  t  e  then 

IK  J(v)  -  J{u))i  -  rj(x,)(v  -  u)||  <  ^(||v  -  x,||  +  ||u  -  x.||)||v  -  u||, 

and  then 

II  Y  -'^(u))i -ri(a;,)(v-u))||  < 

«e/+(i.) 

Y  f^'^(=f»)y(l|v-x.||  +  ||u-x,||)||v-u||  < 
ie/+(x.) 

a(||i;  -x.ll  +  11«  -x.lDllv  -  till, 

where  (7  =  || E,€/+(x.) 

II 
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6  Generalization  of  Algorithms  for  Solving  a  System  of  Inequalities  of  Convex  Functions 

In  this  section,  we  develop  algorithms  for  solving  the  system  of  inequalities  of  convex  functions  in  a  least 
square  sense  and  analyze  convergence  properties  of  these  algorithms.  Refer  to  the  algorithms  described  in 
section  1.5  for  the  solution  of  least  square  problem  of  systems  of  equalities.  We  extend  these  algorithms  for 
the  case  of  inequalities  of  convex  functions  using  generalized  differential  constructs.  In  section  6.1,  local 
convergence  results  are  addressed.  We  extend  Newton’s  method  and  show  q-quadratic  local  convergence. 
We  compare  this  method  with  the  generalized  Newton’s  algorithm  for  nonsmooth  functions  developed  by 
Robinson[1990j  which  he  bases  on  a  point-based  approximation.  We  develop  and  analyze  a  quasi-Newton 
symmetric  secant  update  in  this  section  and  define  a  bounded  deterioration  property  for  secant  updates 
using  generalized  differential  constructs.  By  assuming  this  property  holds  for  the  secant  updates,  we  show 
local  q-linear  convergence.  The  extensions  of  Gauss-Newton  and  Levenberg-Marquardt  methods  for  the  case 
of  systems  of  convex  inequalities  are  discussed,  as  well  as  the  local  convergence  results.  In  related  work, 
Burke[1983],  and  Burke  and  Han[1986]  describe  a  method  for  solving  this  type  of  system  of  inequalities, 
using  the  distance  function  as  a  penalty  type  objective  function  and  projections  to  derive  search  directions. 
They  present  global  convergence  results  for  their  search  direction  using  an  Armijo  line  search  method.  Their 
local  second  order  convergence  result  for  the  non-linear  functions  depends  on  the  Mangasarian  Promowitz 
constraint  qualification  (MFCQ)  condition,  which  is  discussed  with  the  results  in  this  section. 

In  section  6.2,  we  discuss  global  convergence  of  the  Armijo  line  search  and  trust  region  method  applied 
to  solving  a  system  of  inequalities  of  convex  functions  and  equations  of  non-linear  functions  in  least 
square  sense. 

Consider  the  problem,  similar  to  (3.1.2),  with  the  function  Fj,  now  being  convex  in  each  dimension,  and 
C2.  That  is; 

where  F,  is  C^, 

F2  :  9?"  where  Fz  is  and  component- wise  convex, 

and  we  wish  to  solve: 

inf/(x)  where  /(x)  :=  /i(x) -I- /2(x)  :=  ^(Fi(x),  Fi(x))  -h  ^(F2*‘(x),  F2^(x)).  (6.1) 

Denote  F)  to  be  the  Hessian  of  (Fi)i  and  F^  to  be  the  Hessian  of  (F2)i.  Denote  Ji(x)  to  be  the  Jacobian  of 
Fi(x),  and  J2{x)  to  be  a  generalized  Jacobian  of  ■  Then  a  generalized  Hessian  of  /  is: 

ml 

H{X)  ;=  Hy{x)  //2(x)  =  j'({x)Mx)  ^Fi(x)(Fi(x))i  +  J2^(x)y2(x)  -f  Y.  n(^)(^2(x)).. 

i=l  i€/+(i) 
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6.1  Local  Convergence  Results 

Local  convergence  results  use  the  unsealed  search  direction,  as  in  section  4.1. 

6.1.1  Newton-like  Method  for  Solving  V/(x)  =  0:  Define  the  generalized  Hessian  Newton  Method  for 
minimization,  where  Hk  denotes  a  generalized  Hessian  at  iterate  xk-  Then  dk  solves: 

Hkdk  =  -S7fixk), 

where  Xk+i  :=  Xk  dk-  A  solution  dk  exists  and  is  unique  if  Hk  is  invertible. 

Theorem  5.  Newton-like  Method  for  Solving  Convex  Inequalities  and  Non-linear  Equations  in  a  Least 
Square  Sense. 

Let  /  be  given  as  in  (6.1),  where  Fi  is  and  Fq  is  convex.  Assume  there  exists  x,,  such  that 
V/(x.)  =0. 

1.  Let  r  >  0,  and  let  yV(x,,r)  be  a  neighborhood  on  which  V^/i  is  Lipschitz  continuous  of  order  1  for 
some  rank.  /i  >  0. 

2.  Let  some  ^  >  0  be  such  that  any  generalized  Hessian  H(x,)  e  dV/(x.)  is  invertible  and  satisfies: 

\\H-\x,)\\  < 

Then  there  exists  e  >  0,  such  that  if  xq  €  N{x,,t),  then  the  sequence  of  iterates,  x/t+i  :=  Xk-H~\xk)'^  f{xk) 
are  well  defined  and  converge  to  x,  and  satisfy  for  some  7  >  0, 

||Xfc  +  i  -  x,||  <  0^\\xk  -  x.ll^. 

Proof.  Since,  by  hypothesis,  all  the  generalized  Hessians  H{x,)  e  dV/(x.)  are  invertible,  Proposition  1.3.1. 
assures  a  neighborhood  of  invertibility  about  x..  By  choosing  e  sufficiently  small,  for  any  x  £  N{x,.t),  and 
any  H{x)  e  dVf(x},  then  H~^{x)  exists  and 

\\H-Hx)\\<i3. 

By  Lemma  3,  since  F2  is  convex  and  C^,  and  choosing  e  sufficiently  smaller  if  necessary  so  that  .V(x. ,e) 
is  an  identification  neighborhood  of  i,,  then  for  some  72.  for  x  6  N{x,,t)  and  any  generalized  Hessian  of 
/2(x),  H-iix),  it  follows  that 

I|V/2(X.)  -  V/2(X)  -//2(X)(X.  -X)||  <  ^llx.  -Xll^. 

Given  1,  that  V^/i  is  Lipschitz  continuous  of  some  rank  71  on  lV(x,,  r),  then  for  7  =  71  +  72  >  0,  choosing 
e  <  r,  then  for  all  x  €  N{x»,  e): 

l|V/(x.)  -  V/(x)  -  H(x)(x.  -  x)l|  <  7||x.  -  xf. 
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Choose  e  smaller,  if  necessary,  to  satisfy; 


We  show  the  case  for  k  =  0.  Since  H~^{x)  exists  on  N{x,.t),  then  ii  is  well-defined  and  satisfies,  for  any 
generalized  Hessian  of  //(xq): 

Xi  -  X,  =  xo  -  X.  -  //“*(xo)V/{xo) 

=  xo  -  X,  -  //-‘(xo)lV/(xo)  -  V/{x.)l 
=  //“‘(xo)(V/(x.)  -  V/(xo)  -  H(xo){x,  -xo)|. 

Using  the  result  above,  then 

Iki  -  xoll  <  ||//-‘(xo)||  Iliv/(x.)  -  V/(xo)  -  //(xo)(x.  -  xo)l|| 

<  /J-yllxo  -  x.||^. 

Using  induction  for  the  general  k  and  -|-  1  is  accomplished  the  same  way.  || 

This  result  establishes  q-quadratic  local  convergence  of  this  Newton-like  method  based  on  generalized 
Hessians.  Robinson [1990|  extended  Newton’s  method  f^r  nonsmooth  functions  having  a  point  baseu  approx¬ 
imation  satisfying  second  order  approximation.  For  our  special  least  square  function,  using  a  linearization 
about  Xfc  based  on  a  generalized  Hessian  at  x*  to  model  V/(i,)  assures  a  second  order  point  based  second 
order  approximation.  Robinson’s  point  based  approximation  is  not  required  to  be  linear  and  thus,  rather 
than  using  the  Banach  lemma  for  invertibility,  he  develops  an  implicit  function  property  of  invertibility  which 
handles  the  non-linear  point  based  approximation.  Our  result  is  specialized  in  that  we  define  a  specific  linear 
point  based  approximation.  The  result  in  Theorem  5  closely  parallels  the  traditional  proof  for  the  smooth 
case.  Robinson  showed  how  to  use  a  pioint  based  approximation  to  get  a  convergent  Newton’s  method,  while 
this  work  shows  that  a  system  of  linear  equalities  natureilly  gives  rise  to  a  point  based  approximation. 

6.1.2  Adaptation  of  Symmetric  Secant  Update  Q.iasi-Newton  Direction:  A  problem  with  using 
Newton’s  method  for  finding  search  directions  is  that  the  Hessian  or  generalized  Hessian  may  not  be  positive 
definite,  and  possibly  not  invertible,  so  d  may  not  be  a  descent  direction.  However,  even  if  the  generalized 
Hessians  are  all  positive  definite  in  a  neighborhood  of  a  stationary  point,  the  difficulty  in  computing  the 
Hessian  at  each  iterate  is  another  reason  that  practical  algorithms  use  a  quasi- Newton  method.  This  approach 
uses  Hessian  approximations,  avoiding  computational  steps  to  calculate  the  full  Hessian.  We  extend  the 
special  symmetric  secant  update  method  for  least  square  nonlinear  equations  to  solve  for  a  zero  of  V/(x). 
The  method  uses  the  available  generalized  Jacobians,  J{x),  and  uses  a  secant  update,  Bk,  to  approximate 
the  second  order  term: 

m  I 

Bk  ^  Q{xk)  =  T{Fi(xk)),r\{xk)  +  T  (F2(xfc)).r2(xfc). 


Denote 


F(x)  = 


An  algorithm  using  the  symmetric  secant  update  for  generalized  Jacobians  is  as  follows:  Solve  for  dk 


+  Bk)dk  =  -V/(xfc),  and  set: 


3;*+!  Xk  +  dk, 

Vk  V/(x*^,)  -  Vf(xk), 

y*  ■=  J^{xk+\)F(xk^\)  -  j'^(xk)F{xk+i), 

n  n  ,  (y*  “  Bkdk)dl  +  dkiy*  -  BkdkV  {vt  “  Bkdk,  dk)yk(il 

- — 

The  updated  Bk+i  satisfies  the  quasi-Newton  condition: 


Bk+idk  =  y*- 


Dennis,  Gay,  and  Welsch[1981]  developed  this  special  symmetric  secant  update  for  a  least  square  solution  of 
non-linear  equations.  For  the  smooth  case,  they  show  this  update  solves  the  following  least  change  problem: 

min  \\(T'^r\Bk^,-Bk)T-^\\r 

Sk+i 

subject  to  B)c+i  -  Bjk  symmetric,  =  yjf, 

where  T  €  3?"*",  a  weighting  matrix,  is  non-singular  and  satisfies  TT^ dk  =  yu- 

Since  this  method  is  specially  tailored  for  the  least  square  solution  of  non-iinear  equations,  it  is  an 
appropriate  quasi- Newton  method  to  extend  to  the  least  square  solution  of  inequalities  of  non-linear  convex 
functions.  This  update  is  called  a  rescaled  symmetric  secant  update,  because  of  the  weighting  matrix  T. 
This  update  and  the  unsealed  update  are  analyzed  in  Dennis  and  Walker [1981]  for  the  smooth  case.  Local 
convergence  analysis  of  quasi- Newton  methods  for  generalized  differential  constructs  requires  that  we  define 
a  bounded  deterioration  property  for  the  updates,  Bk- 

Definition.  Bounded  Deterioration  Property  for  Generalized  Hessians. 

Let  N{x,,r)  be  an  identification  neighborhood  of  a  stationary  point  x,,  with  iterates  Xk  €  N{x.,r). 
Define  J{xk)  €  dF+{x*),  by  (J^ixk)^  =  0,  for  i  e  f^(xkh  (Note  that  for  indices  i  <_  I'^ixk)  U  I~{xk), 
that  J-2{xk))i  is  unique.)  Define  e  ^F^.(x,)  to  be  the  generalized  Jacobian  which  models  x^  —  x,,  as 
defined  in  Proposition  5.2.2,  and  choose  [J^Yk  =  0  for  indices  i  e  /°(x,).  (Again,  note  that  for  indices 
I  €  /^(x,)U  /'(x,).  that  (-^2)11  unique.)  Say  the  updates  Bk,  such  as  in  (6. 1.2.1),  satisfy  the  bounded 
deterioration  property  if  there  exists  an  a,  such  that  for  all  k, 

ll^k^i'^k^t  -  {Jk+\fJk+i  +  Bk+i  -Q*ii  <  11-^  Jk  -  (Jkf  Jk  +  Bk  -  <?.||  -t-Qrmax())x*+j  -x.)J.))x*  -x.))). 

(6. 1.2.2) 
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The  particular  Jk  €  dVf{xk)  and  e  5V/(x,)  that  are  chosen  allow  consistent  comparison  of  (6. 1.2. 2)  at 
each  iteration  fc.  Since  in  an  identification  neighborhood,  I°(xk)  C  I°(x,),  and  since  models  Xk  —x,.  the 
distance  between  these  two  generalized  Jacobians  is  the  distance  between  the  compact  convex  sets,  dF(xk) 
and  dF{x,).  One  could  choose  other  generalized  Jacobians  in  the  definition  of  the  bounded  deterioration 
property  if  this  selection  of  generalized  Jacobian,  J(xk)  €  dF{x)  is  compared  properly  with  the  closest 
element  J(x,)  e  dF{x,).  By  choosing  a  particular  generalized  Jacobian  at  each  iteration,  (J{xk)),  =  0, 
for  i  €  P(xk),  and  if  the  updates  satisfy  the  bounded  deterioration  property,  then  q-linear  convergence  is 
assured  in  some  identification  neighborhood,  which  is  also  a  neighborhood  of  invertibility. 

Theorem  6.  Local  Convergence  of  Symmetric  Secant  Method  for  Inequalities. 

Let  /  be  given  as  in  (6.1).  Let  Fi  be  and  F^  be  convex.  Let  x,  be  given  and  assume  V/(x.)  =  0. 
Let  iV(x,,r)  be  an  identification  neighborhood  of  i,.  Assume  V^/i  is  Lipschitz  continuous  of  some  rank  71 
and  let  72  be  the  order  1  Lipshitz  constant  of  j(F2(x),  F2(x))  on  N(x,,r),  where  7  :=  71+72.  Assume  that 
all  generalized  Hessians,  //(x.)  e  5V/(x,)  are  invertible  and  satisfy  on  N(x,,r),  ||//"'(x,)||  <  /?,  for  some 
0.  Suppose  the  sequence  of  iterates  and  updates  (6. 1.2.1),  where  the  generalized  Jacobians  at  x*  are  chosen 
so  that  {J{xk))i  =  0,  for  i  €  I°{xk),  satisfy  the  bounded  deterioration  property  on  N{x,,r)  for  some  a. 
Then  there  exists  positive  constants  «  and  6,  such  that  if  ||xo  - x, |1  <  e  and  1|.^ Jo  -  {Jq  )^ Jq  +  Bo-Q,\\  <  6, 
then  the  sequence  of  symmetric  secant  update  iterates  generated  by  (6. 1.2.1)  are  well-defined  and  converge 
q-superlinearly  to  x, . 

Proof.  Choose  e  small  enough  to  insure  a  neighborhood  of  invertibility  of  x,,  choose  6  to  satisfy; 

6^5  <1,  (6. 1.2.3) 

and,  if  necessary,  choose  e  sufficiently  smaller  to  also  satisfy; 

3ae  <  6,  and  37e  <  26.  (6. 1.2.4) 

We  prove  the  two  statements  (6. 1.2. 5)  and  (6. 1.2.6)  below,  by  induction  on  k\ 

\\JjJk  -  {Jtfj:  FBk-  Q,\\  <  (2  -  2-*)6,  (6.1.2.5) 

where  e  5F+(x,)  is  the  generalized  Jacobian  of  Proposition  5.2.2  which  models  x*.  -x,,  and  Jk  €  dF+{xk) 
is  the  generalized  Jacobian  chosen  so  that  {Jk)i  =  0,  for  the  indices  t  6  /°(xfc)  as  in  the  definition  of  the 
bounded  deterioration  property  for  generalized  Hessians. 

Ikfc  +  i -x.ll  <  ^||xfc -x.||.  (6. 1.2.6) 

For  fc  =  0,  (6. 1.2.5)  is  satisfied  by  assumption  in  the  hypothesis.  We  show  (6. 1.2. 6)  in  the  induction  step, 
which  is  comparable  to  the  case  for  fc  =  0.  Assume  (6. 1.2. 5)  and  (6. 1.2.6)  hold  for  fc  =  1,2..  ..,/  -  1. 
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For  k  =  I,  by  the  bounded  deterioration  property  assumption,  and  the  two  induction  hypothesis,  then  the 
selected  generalized  Jacobians,  J*,  which  models  xj  —  x,.  and  J;  satisfy: 

1|J7J,  +  S,-Q.||  <  {2-2-('-^))5  +  a||x;_i  -x.ll,  (6.1.2,7) 


From  (6. 1.2.6)  and  given  that  ||xo  -  x*||  <  e,  then 

-  x,||  <  2“^'“‘)||xo  -  x.ll  <  2“^'~''e. 

Substituting  this  into  (6.1. 2.7)  and  using  (6.1. 2.4)  yields 

\\JTJi  -  +  Si  -  Q.w  <  (2  -  2-('-'^)6  +  a2-('-*>e 

<  (2  -  +  2-')6  =  (2  -  2“')<5. 

which  verifies  (6. 1.2. 5).  To  verify  (6. 1.2.6),  first  show  that  Ji  +  Bi  is  invertible  so  that  the  iterates 
are  well-defined.  Given  by  hypothesis  that  any  generalized  Hessian  in  dV/(x, )  is  invertible  and  satisfies 
|1//“H2;.)||  <  P,  from  (6.1.2. 5)  and  the  induction  hypothesis  (6.1. 2. 7), 

\\H-Hx,)iJlJi  +  Si  -  //{x,)||  <  ||,f/-‘(x.)||  WiJlJt  +  Bi-  <  P[2  -  2-')fi  -  206  <  i. 

So,  from  the  Banach  lemma  on  invertibility  of  perturbed  linear  operators,  Jf  Ji  +  Bi  is  invertible  and  satisfies: 


I1(T/j,  +  s,)"NI  < 


3 


Z3 


(6.1.2.8) 


1  -  \\H-^{x,)(Jf  3i  +  S,  -  H{x,))\\  -  1  -  1  2  • 

Thus,  i(+i  is  well  defined  and  for  the  generalized  Jacobian  J*  €  dF+(x.)  which  models  x;  -  x,: 

Ik(+1  -  x.ll  <  WiJlJi  +  s,)-Ml  [II  -  v/(x.)  +  v/(x,)  +  h:(x.  -  x,)||  +  WiJlJi  +  Si  -  h:\\  ||(x.  -  x,)||]  , 

where  H*  :=  {J‘)^ J*  +  Q,  is  the  generalized  Hessian  which  models  X|  -  x,.  From  Lemma  3, 

II  -  V/(x.)  +  V/(x,)  -h  Htix.  -  x,)||  < 


Substituting  (6.1. 2.5),  (6. 1.2.8),  and  from  above; 

3 


I|xi+1  -  x.ll  < -/?  [^|||xi -x.ll  +  (2 -2  ‘)(5  1|X|-X.||. 


From  (6. 1.2.6),  the  hypothesis  that  ||xo  -x.||  <  e,  and  (6. 1.2. 4),  then 

7||x,-x.||  ^o-i+i  < 

n  —  ^  —  o’ 


(6. 1.2.9) 


which  substituted  into  (6. 1.2.9),  gives: 


l|Xi+l  -  x.ll  <  -3 


<\-i 

— +  2-2-‘ 


<511x1  -  x.| 


<  3/?6||xi  —  x.| 
,  l|xi  -  x.ll 
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with  the  final  inequality  coming  from  (6. 1. 2.4).  This  proves  (6. 1.2.6)  for  the  induction  step,  /.  Using  these 
two  induction  results,  it  follows  that  the  iterates  converge  q-linearly  to  x,.  |1 

Dennis  and  Walker[1981)  show  the  conditions  for  the  smooth  case  for  the  bounded  deterioration  property 
to  be  met.  First  of  all,  the  set  of  matrices  from  which  to  choose  the  least  change  update,  B*,  should  include 
the  matrix  Q,.  Since  Q,  is  symmetric,  positive  semi-definite,  the  symmetric  secant  update  (6. 1.2.1)  satisfies 
this  condition.  The  other  condition  is  whether  there  is  some  k  such  that  for  all  fc,  y*  satisfies: 

||yf  -  Q.fxfc+i  -  xib)||  =  ||(Jfc+i  -  JkfF+(xk+i)  -  Q,(xfc+i  -  xOll 

(6.1.2.10) 


<  Kmax{||xfc  -  x,|l,  ||x*:+i  -  x,||}||x*+i  -  Xfc||. 

Notice  that  (6.1.2.10)  is  similar  to  the  inequality  in  Lemma  4,  which  states  the  existence  of  a  >  0  such  that 


for  Q*,  t/kf 


||(Jfc+i  -  -  Q,(xk+i  -Xfc)||  <  cr(i|xjt  -x.ll  -I-  ||xfc+i  -  x,||)llxfc+i  -  Xfc||. 


For  the  smooth  case,  where  /  is  ,  the  authors  show  that  the  choice  of 

y*  :=  V/(x*;+i)  -  V/(xit)  -  j'^iXk  +  l)J{Xk+l)iXk+l  -Xk) 


always  satisfies  the  bounded  deterioration  property.  However,  they  cite  that  update  (6. 1.2.1)  works  just  as 
well  in  test  cases.  Update  (6.1.2. 1)  is  used  in  NL2SOL  algorithm,  Dennis,  Gay,  and  Walker[1981].  This 
algorithm  is  a  hybrid  algorithm  which  uses  the  model  trust  region  method  (Levenberg-Marquardt),  with 
a  zero  matr'x.  Bo,  unless  the  secant  update  gives  a  better  match  between  predicted  and  actual  reduction. 
The  choice  of  comparison  of  generalized  Jacobians  and  the  global  convergence  of  x*  -♦  x,  may  assure  this 
condition  (6.1.2.10)  can  be  met  in  practice.  For  the  smooth  case,  the  authors  show  that  NL2SOL  performs 
better  in  test  cases  than  the  Levenberg-Marquardt  algorithm,  especially  for  the  non-linear,  large  residual 
case. 


Dennis  and  Walker[1981,  Theorem  3.3|  show  for  the  smooth  case  that,  if  the  iterates  converge  q-linearly, 
then  the  iterates  converge  q-superlinearly  if  and  only  if 


lim 

k~*oo 


\\(JlJk  -  {Jtfj;  +  Bk-  (?.)(x*+i  -  X*) 


=  0 


(6.1.2.11) 


||xik+l  -Xfcll 

6.1.3  Adaptation  of  Gauss-Newton  for  Convex  Inequalities.  If  the  residuals  are  small,  it  may  be 
advantageous  to  use  the  Gauss-Newton  method  for  the  nonlinear  inequality  case,  since  the  computation  is 
somewhat  simpler.  We  adapt  the  local  convergence  result  for  the  case  of  inequalities  of  convex  functions, 
combined  with  the  non-linear  equalities. 


Theorem  7.  Local  Convergence  of  Gauss-Newton  for  Solving  Least  Square  Solution  of  System  of  Inequalities 
of  Convex  Functions  and  Non-linear  Equalities. 


Let  Fi(x),  F2(x)  be  given  as  above  and 

/(x)  :=  /i(x)  +  /2(x) i(F,(x),  F,(x))  +  i(F+(x),  F2^(x)), 


.S8 


and  assume  F2  is  convex,  and  that  F\  is  C^.  Assume  the  following: 

1.  /i(x)  is  continuously  differentiable  in  an  open  subset,  D  C  3?"  and  let  J\{x)  be  Lipschitz  continuous  on 
D,  of  some  rank  71. 

2.  There  exists  x,  6  D,  such  that  V/(x»)  =  0. 

Assume  that  on  the  identification  neighborhood  N{x„r)  c  D,  that  the  following  are  satisfied: 

3.  Ji  is  bounded  on  N{x,,r). 

4.  There  exists  cr  >0,  such  that,  for  all  x  e  Nlx,,r),  for  any  choice  of  generalized  Jacobian,  J2ix),  J2(x, ), 

||(Ji(x)  -  Mx.)fFi(x,)  +  (Mx)  -  J2{x,)fF^{x,)\\  <  a|lx  -x.||.  (6.1.3.1) 

5.  A,  the  smallest  eigenvalue  of  J^(x,)J(x,),  for  any  choice  of  generalized  Jacobian,  satisfies: 

X  >  a. 

Then  there  exists  c  6  (1,  ^),  and  there  exists  e  >  0  such  that  for  all  x  6  Af(x,,  e)  and  for  any  choice  of 
generalized  Jacobian 

||(J^(x)J(x))-‘||  <  £ 

and  for  all  xq  €  N{x,,i),  the  Gauss-Newton  iterates 

Xfc+i  =  Xfc  -  {J(ifc)^J(xik))~‘ J(x/fc)^F(xfc) 

are  well-defined,  converge  to  x,  and  satisfy,  for  some  7  >  0  and  for  all  a  where  ||J(x)l|  <  a,  for  all 
X  e  N(x,,  r): 

l|xfc+i  -  x.ll  <  y  Ikt  -  ar.ll  +  ^11^*  “  (6. 1.3.2) 

l|xfc+i  -  x.ll  <  -  X.  I  <  ||xt  -  x.||.  (6.1.3.3) 

Proof.  Since  any  generalized  Jacobian  satisfies  that  IK J(xt)^J(xfc))“^||  <  j  and  since  c  >  1,  then 
Proposition  1.3.1  assures  a  neighborhood  of  x,,  such  that  ||( J^(x)J(x))~'|l  ^  x-  By  1,  since  Ji  is  Lipschitz 
continuous  on  N{xt,r)  of  some  rank  7,  and  since  F2  is  C~  convex,  choose  e  small  enough  to  insure  a 
Proposition  5.2.1  identification  neighborhood.  Then  Proposition  5.2.2  holds  for  some  72.  Thus,  ther"  is 
7  =  7i  +  72  >0,  such  that  any  generalized  Jacobian  of  F  at  x  €  N{xt,  r)  satisfies: 

||F(x.)  -  F(x)  -  J(x)(x.  -  x)||  <  £||x,  -  x||^. 

The  proof  of  the  theorem  follows  directly  the  pattern  of  Theorem  1.  || 

If  the  residual  of  F(x,)  is  zero,  just  like  in  the  linear  case,  convergence  is  q-quadratic.  This  is  similar 
to  the  result  of  Burke(1983]  for  his  Gauss-Newton  like  projected  step®.  He  also  discusses  the  Mangasarian 


Fromowitz  Constraint  Qualification  (MFCQ)  at  a  given  point  x.  The  MFCQ  for  a  given  point  x  is  satisfied, 
if  there  exists  a  direction  p,  such  that 

(VF2(x],p)  <  0  where  i  €  /^(x)  U  F^(x), 

{VFj(x),p)=0  where  t  =  1, ml,  (MFCQ) 

and  {VFJ(x)|t  =  1, . . . ,  ml}  are  linearly  independent. 

Robinson  showed  that  if  a  point  x  satisfies  the  MFCQ  condition,  then  the  condition  is  satisfied  for  all  points 
in  some  neighborhood  of  x.  Using  this  result  and  a  result  of  Daniel[1973],  Burke  showed  that  his  Gauss- 
Newton  directions,  starting  from  a  point  in  an  identification  neighborhood  of  a  stationary  point  x,  having 
/(x,)  =  0,  where  the  MFCQ  holds  for  some  other  point  x  in  this  neighborhood  and  /(x)  =  0,  then  the 
iterates  converge  q-quadratically  to  x, . 

The  local  convergence  behavior  of  Levenberg-Marquardt  directions  for  the  general  convex  case  is  compa¬ 
rable  to  Gauss-Newton.  The  applicability  of  either  of  these  methods  depends  on  how  nonlinear  the  function 
F  is  at  X*  and  on  the  size  of  the  residual  at  x,,  as  is  discussed  in  section  (4.1.1). 

6.2  Global  Convergence  Results 

Both  global  convergence  results  are  applicable  as  in  the  case  of  linear  inequalities.  The  Armijo  line  search 
follows  directly  without  adaptation,  since  V/2(x)  is  Lipschitz  continuous  of  some  rank  7  on  the  lower  level 
set  of  the  first  iterate,  since  Fj  is  convex  and  C^.  By  assuming  that  V/i(i)  is  Lipschitz  continuous  of  some 
rank  on  the  lower  level  set  of  the  first  iterate,  then  the  Armijo  line  search  conditions  are  satisfied. 

Similarly,  the  global  trust  region  method  using  generahzed  Hessian  model  function  also  follows  directly. 
Since  F2  is  convex  and  C^,  then,  in  an  identification  neighborhood  of  any  accumulation  point,  the  Hessian 
model  function  converges  quadratically  to  f2-  By  assuming  that  Fj  is  C^,  the  conditions  of  the  generalized 
Hessian  model  for  the  global  trust  region  method  are  satisfied. 
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7  Applications 


The  least  square  algorithm  developed  in  previous  sections  can  be  used  to  solve  the  assignment  problem 
subproblem  (2.5.1)  as  well  as  other  common  optimization  problems,  such  as  linear  programming  problems 
and  the  linear  complementarity  problem.  In  section  7.1,  we  develop  algorithms  to  solve  the  piece- wise  linear 
assignment  problem  in  section  2.  The  algorithm  calls  subroutines  based  on  Han's  algorithm  for  solving 
systems  of  linear  inequalities  in  a  least  square  sense  and  the  algorithms  developed  in  section  4  for  solving 
systems  of  linear  inequalities  and  nonlinear  equations.  The  local  minimum  stationary  point  returned  as  a 
solution  to  the  subproblem  (2.5.2)  may  not  >ield  a  global  minimum  of  the  least  square  formulation.  We  show 
that  non-global  local  minimum  stationary  points  are  either  associated  with  a  w-component  vertex  which  is 
not  a  possible  {0, 1}  solution,  or  there  is  no  solution  with  integral  w-component.  In  the  latter  case,  one  can 
form  cutting  planes  and  reformulate  the  relaxed  problem.  Based  on  the  assumption  that  proper  starting 
points  can  be  found  for  each  iteration  (corresponding  to  vertices  of  the  w-component  where  Wj  =  0  or  1),  the 
algorithm  terminates  finitely  in  a  solution  with  integral  w-component  or  finds  that  no  solution  exists.  The 
global  convergence  property  of  the  least  square  algorithms  in  section  4  gives  the  potential  for  fewer  than  2" 
iterations.  In  section  7.2,  we  show  how  the  the  relaxed  linear  programming  problem  can  be  formulated  as 
system  of  linear  inequalities  and  equalities  which  can  be  solved  using  Han’s  algorithm.  Combined  with  the 
nonlinear  equations,  the  problem  represents  the  peice-wise  linear  assignment  problem  subproblem  (2.5.1) 
which  can  be  solved  by  the  algorithm  developed  in  section  7.1. 
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7.1  Solving  the  Piece- wise  Linear  Assignment  Problem 

Recall  that  a  solution  w.  y.  z  of  the  relaxeti  primal  generated  by  r.  u,  v  which  solves  the  dual  problem  in  (2,2) 
IS  a  solutii  :  to  a  system  of  linear  inequalities  and  equalities  with  respect  to  the  following  index  sets  ; 


J  : 

=  01 

€  (0,  li} 

U'j0j  = 

E 

D.jJ/i.j  - 

./^  : 

=  OOj 

=  1} 

t 

=  Zj 

>  Wj0j 

.!<  : 

Okj 

=  0} 

<  U’j 

*T‘ 

II 

■0 

r;- 

~  {<1  - 

r,j  t  u. 

-Cx.jrj  <  0} 

o 

II 

/;  : 

,  >0} 

yx.j  = 

Here  w.  y,  c  satisfy: 

y,.,  =0,  Vie/;\Vy 
yt.j>0,  Vi,y 

J 

yx.j  =  Wj, 

yi.]<Wj,  'ii,j 


VI 

3 

(7.1,1) 

o 

II 

1 

X 

Vj  e  J" 

X 

vj  e 

^  ^  ^  —  WjPj, 

Vj  e  J^. 

Let  P  -  (I  u'.  y,  i)|  which  satisfies  (7.1.1)}  be  the  nonempty  convex  polyhedral  relaxed  primal  solution  set 
generated  by  a  single  dual  solution.  (Note  that  z  is  an  auxiliary  variable  and  would  not  be  used  in  the 
computation  )  Consider  P  as  the  non-empty  convex  polytope  in  (by  eliminating  the  variable  r). 

.All  points  in  P  have  Wj  €  [0. 1].  As  in  section  2.5,  use  the  single  variable  x  to  represent  the  variable 
'  >c.  y  i  Define  the  function  F\  to  represent  the  nonlinear  forcing  function  of  w,  where  (Fi)j  :  3?  i— >  3?.  and 
!.'''i(j-i)j  :  -  u'j(l  -  Wj).  Let  the  matrix  A  and  the  vector  b  represent  the  linear  transformations  in  the 
inecpialities,  and  the  matrix  C  and  the  vector  d  represent  the  linear  transformations  in  the  equalities.  The 
least  .s<piare  algorithm  for  solving  systems  of  non-linear  equations  and  linear  inequalities  returns  a  local 
minimum  stationary  point  x.  which  solves  in  a  least  square  sense: 

Fi(x)  =0, 

Ax  <6,  (7.1.2) 

Cx  =  d. 
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Now,  define  F  ;  where  p  =  m  +  4n  +  2mn  +  J2j(\^j\  +  (The  notation  |/°|  means  the 

cardinality  of  the  set  Ij.  )  Then 

/  F.(x)  \  /F,(x)\ 

Fix)  ~  (Ax-b)^  =  F2(x)  . 

\{Cx-d)J  \F3{x)J 

Let  /  :  SR"*'""  be  defined; 

fix)  :=  /:(x)  +  /2(x)  +  /3(x),  where 
/i(x);=^(F,(x),Fi(x)), 
f2{x)  ■■=  i((Ax-6)+,(>lx-6)*), 

/3{x):=^{(Cx-d),(Cx-d)). 

Observe  that  the  w-component  vertices  of  [0, 1]”  are  determined  by  Wj  =  0  or  1  for  each  j.  A  convex 
subset  of  [0, 1]"  associated  with  each  vertex  is  determined  by  0  <  tCj  <  |-  or  |  <  1,  for  each  j.  There 

are  2"  of  these  subsets,  one  associated  with  each  w-component  vertex,  The  minimum  of  /j  (x)  on  each  of 
these  subsets  is  attained  at  the  associated  w-component  vertex.  Points  having  W;  =  5  for  some  j  are  not 
identified  with  any  of  the  vertices.  We  show  the  following  property  about  local  minimum  stationary  points 
of/. 

Proposition  7.1. 

If  X.  is  a  local  minimum  stationary  point  of  /,  then  x,  satisfies; 

1. )  If  X,  e  P,  then  ic*  €  {0,  5, 1},  for  all  j.  If  fix,)  —  0,  then  ic*  6  {0, 1},  for  all  j. 

2. )  If  X,  ^  P  and  if  lu*  =  ^  for  some  j,  then  for  all  x  €  P,  the  only  feasible  choice  is  ^  for  these  j. 

Proof.  The  first  result  uses  the  property  that  V/2(x,)  +  Vfiix,)  =  0,  for  x,  e  P.  The  solutions  of 
V/i(x)  =0  satisfy; 

«;*{!  -  -  w'j)  =  0, 

which  implies  €  {0, 1,  j}.  Then  if  fix,)  =  0,  it  must  be  that  w*  €  {0, 1}. 

For  the  second  result,  in  order  that  x,  is  a  local  minimum  stationary  point,  then  V/(x. )  =  0.  If  there 
exists  a,P  such  that  0<a<^  <^<  1,0?^/?,  and  ja, is  feasible  for  Wj,  then  x.  is  not  a  local 
minimum  stationary  point.  Rather,  the  tc*  component  is  a  local  maximum  stationary  point  with  respect  to 
this  component.  Otherwise,  suppose  there  exists  o  <  /?  ,  wher^  either  a,  /3  €  [0,  or  a,  f3  e  (:j,  1]  and  [q.  3] 
is  feasible  for  Wj.  Consider; 


evaluated  at  Wj  — 


(V/(x))j  =  iWj  -  q)  +  U)]i\  -Wj)  -Wjil  -  Wjf 

=  (i-«)<0. 
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This  means  x.  with  w*  =  -  is  not  a  local  minimum  stationary  point.  Similarly, 


(V/(x))j  =  (Wj  -/?)  +  U)j(l  -  Wj)  -  Wj(l  -  Wjf 

evaluated  at  Wj  — 

=  (i-«>0. 

Again,  this  means  x.  with  ic*  =  j  is  not  a  local  minimum  stationary  point.  Because  of  the  convexity  of  P. 
these  are  the  only  possible  types  of  feasible  regions,  and  so  the  only  possibility  for  x,  to  be  a  local  minimum 
stationary  point  with  Wj  =  ^  for  some  j  is  that  the  only  feasible  choice  for  these  Wj  is  JCj  =  || 

This  proposition  assures  that  local  minimum  stationary  points  with  any  w*  =  \  represent  a  condition 
where  there  is  no  point  with  w  integral  in  P.  Further,  any  other  local  minimum  stationary  point  has  all 
w‘  ^  Thus,  in  the  case  that  /(x. )  ^  0,  one  can  identify  a  vertex  of  the  w-component  associated  with 
this  point,  or  if  any  of  the  w*  =  then  no  solution  with  integral  w-component  exists.  It  follows  in  the 
case  where  /(x,)  ^0  where  no  w*  =  ^  that  the  corresponding  w-component  vertex  is  not  the  w-component 
vertex  of  a  possible  solution. 

The  cutting  plane  method  to  solve  integer  programming  problems  originated  in  the  work  of  Gor- 
mory[l  960,1963)  and  is  described  in  elementary  texts  such  as  Garfinkel  and  Nemhauser[1972].  Conn  and 
Comuejols[1990],  Cornuejob  et  al[1989j,  and  Ahn  et  al[1988)  describe  the  cutting  planes  that  generate  facets 
of  the  uncapacitated  facility  location  polytope.  This  polytope  is  the  convex  hull  of  the  solution  set  of; 

min  ^  Zj 
i 

subject  to; 

-<t>jWj  -I-  Zj  >  0,  Vj 

-  ^  0- 
i 

=  Vi 
J 

l/i.j>0,  Vi,j 

Wj  €  {0,  1 },  Vj. 

The  following  cutting  plane; 

yo,f  +  yp,h  +  yp.i  +  j/q.i  +  yq,k  +  yo,k  -  w^  -  wi  -  wk  <  \  (7.1.4) 

defines  a  facet  of  the  uncapacitated  facility  location  polytope  for  any  hj,k  €  J,  and  o,p,q  g  /,  such  that 
h  ^  I  ^  k  and  o  ^  p  ^  q.  It  cuts  off  fractional  basic  solutions  of  (2.1.2)  the  relaxed  primal  solution,  where  all 
the  variables  in  (7.1.4)  take  the  value  The  authors  show  examples  of  using  these  special  cutting  planes, 
where,  after  Just  a  few  iterations,  a  {0, 1}  solution  is  feasible. 
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We  describe  an  algorithm  to  find  a  {0, 1}  solution  for  the  problem  in  section  2.  Define  LLS{P)  to  be 
Han’s  algorithm  which  returns  a  least  square  solution  x*  of  P  define  by  (7.1.1),  as  well  as  residuals  r,.  Define 
NLLS(f)  to  be  the  algorithm  which  returns  a  local  minimum  stationary  point,  i.,  of  /  generated  by  (7.1.2), 
which  is  a  least  square  solution  to  a  system  of  non-linear  equations  and  linear  inequalities.  The  algorithm 
calls  both  LLS(P),  NLLS(f).  Consider  I  e  V,  where  initially  V  =  {1,2,  ...,2"}  represents  the  vertices  of 
the  w-component  of  x.  For  each  I  e  V,  choose  an  initial  starting  point  with  the  associated  w-component 
vertex,  selecting  =  0  or  1  appropriately.  If  F  contains  points  having  w-component  in  this  subset  of  [0,  Ij" 
associated  with  the  particular  w-component  vertex,  then  if  one  chooses  a  starting  point  in  P,  identified  with 
this  w-component  vertex,  then  the  algorithm  NLLS(f)  should  converge  to  a  point  with  the  same  identified 
w-component  vertex.  Thus,  one  way  to  choose  the  starting  points  for  NLLS(f)  is  to  call  Han's  algorithm  to 
find  a  feasible  point  in  P  and  associating  iCj  =  0  or  1  depending  on  the  closest  endpoint.  However,  after 
several  iterations,  one  may  not  find  different  vertices  based  on  points  in  P.  Another  approach  is  to  choose 
a  starting  point  based  on  a  particular  w-component  vertex  where  the  y-component  of  the  point  xq  satisfies 
the  conditions  in  (7.1.1)  in  a  least  square  sense.  Then,  call  NLLS(f)  with  the  starting  point  and  test  the 
conditions  of  Proposition  7.1.  If  /(x, )  =  0,  then  the  algorithm  has  found  a  solution  where  the  w  component 
is  integral.  If  any  of  the  w*  =  j,  then  the  algorithm  stops  and  using  cutting  planes  described  by  Conn  and 
Cornuejols[t990),  a  new  problem  is  formulated,  feasibility  is  restored,  and  a  new  set  P  is  formed.  Then,  one 
can  again  run  the  algorithm  below  with  this  new  set  P.  Lastly,  if  x,  ^  P,  then  none  of  the  ic*  =  |  and  there 
is  an  associated  w-component  vertex  of  x,.  If  this  vertex  is  the  same  as  that  of  the  initial  point,  then  this 
vertex  is  removed  from  the  list  V .  If  the  the  initial  vertex  is  different,  then,  since  the  initial  starting  point 
was  chosen  with  its  w-component  in  this  subset  of  (0, 1]”  with  y-component  as  close  as  possible  to  feasibility, 
then  one  can  remove  this  initial  starting  point  vertex  from  the  list  as  well.  Because  of  the  convexity  of  P,  if 
there  are  any  points  of  P  having  w-components  in  this  subset,  then  the  algorithm  NLSS  should  find  this  local 
minimum  with  its  w-component  within  this  subset.  Finally,  the  algorithm  iterates  selecting  another  vertex 
from  the  list  V .  If  at  any  iteration,  both  endpoints  of  some  Wj  are  found  to  be  infeasible,  then  Gomory 
cutting  planes  would  be  developed  to  cut  off  non-integral  solutions,  the  relaxed  program  reformulaoed  and 
using  a  new  set  P,  the  algorithm  would  be  rerun. 

The  algorithm  is  shown  below: 

Algorithm  using  NLLS  and  LLS  : 

V^  =  {1,2,...,2"}; 

P  is  the  set  defined  by  (7.1.1)  or  by  cutting  planes; 

ITER: 

SELECT  1  from  V;  (Choose  vertex  from  V  ^  9) 

SET  xo;  (calling  LLS{Py)  with  set  w-component  and  y-component  as  variable.) 
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X,  =  NLLS{f)\  (where  /  is  the  function  defined  by  the  set  P.) 

IF  /(x.)  =  0,  THEN  done;  (x.  is  an  integral  solution) 

IF  ANY  w*  =  THEN  quit;  (Use  cutting  planes  to  form  new  relaxed  program  and  new  set  P) 

(Thus,  X*  is  infeasible  and  no  w*  =  i) 

1  =  w-component  vertex  of  xq; 

V  =  U\{/};  (Remove  initial  vertex  from  V.) 

/,  =  w-component  vertex  of  x. ; 

IFi^l.,THEN  V  =  Y\{i.}; 

FOR  j  =  1  TO  n;  BEGIN; 

IF  lOj  =  0  or  1  both  infeasible  THEN  quit; 

(Use  Gomory  cutting  planes  to  form  new  relaxed  program  and  new  set  P.) 

END; 

GO  TO  ITER; 

This  algorithm  terminates  finitely  if  an  integral  w-component  solution  exists  in  the  set  P.  At  each 
iteration,  at  least  one  different  vertex  is  removed  from  the  list.  The  above  algorithm  chooses  all  starting 
points  for  calling  NLSS  based  on  an  associated  w-component  vertex,  setting  the  lUj  =  0  or  1  depending  on 
the  vertex.  If  a  point  in  P  has  this  w-component  vertex,  then  LSS  algorithm  would  find  a  solution  with 
this  integral  w-component.  If  not,  then  the  point  returned  from  LSS  gives  the  least  square  solution  in  the 
y-component  having  this  associated  w-component  vertex.  If  there  are  points  in  P  having  w-component  in 
the  subset  associated  with  this  w-component  vertex,  then  NLSS  (using  this  starting  point)  finds  the  local 
minimum  stationary  point  with  the  same  associated  w-component  vertex.  If  the  starting  point  is  the  least 
square  solution  of  the  y-component  with  a  given  w-component  vertex,  then  if  the  NLSS  algorithm  converges 
to  a  point  having  a  different  w-component  vertex,  then  one  can  eliminate  both  w-component  vertices  from 
consideration.  Thus,  it  is  possible  that  two  different  vertices  are  removed  at  an  iteration.  If  after  sufficient 
number  of  possible  w-component  vertices  are  eliminated,  one  finds  for  some  j  such  that  Wj  =  0  or  1  are 
both  eliminated,  or  if  the  NLSS  algorithm  finds  a  local  minimum  stationary  point  with  some  ic*  =  j,  then 
no  point  in  the  set  P  has  integral  w-component.  One  would  then  form  cutting  planes  and  a  new  set  P  and 
rerun  the  algorithm. 

This  algorithm  calls  LSS  at  each  iteration  in  finding  starting  points  and  calls  NLSS  once  each  iteration. 
The  global  convergence  property  of  the  algorithms  in  section  4  provides  the  potential  that  fewer  than  2" 
iterations  are  needed. 
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7.2  Solving  Linear  Programming  Problems 

Han[1980j,  Mangasarian[19811,  and  Stewart(1987|  describe  using  iterative  continuous  algorithms  to  solve 
linear  programs.  Specifically.  Han’s  algorithm  could  be  used  to  solve  a  linear  complementarity  problem 
which  takes  .  j  same  form  as  the  solution  set  of  a  linea'-  program.  One  might  also  use  this  algorithm  to 
solve  the  relaxed  linear  program  described  in  section  2.  Using  the  duality  property,  the  optimal  solution  of 


the  primal  problem  (2.1.2)  and  the  optimal  solution  of  the  dual  problem  (2.2.1)  must  satisfy: 

3  * 

3 

in  addition  to  satisfying  all  the  constraints  in  both  problems.  Therefore,  one  could  formulate  a  convex  set, 

P,  in  the  space  of  the  combined  primal  and  dual  variables,  where  P  satisfies: 

M 

d' 

IV 

M 

3  * 

j 

-  Vij  >  0, 

Vi,  j 

—(j>jWj  f  Zj  >  0, 

vj- 

Vj 

II 

Vi 

Vi.]  >  0. 

Wj  <  1, 

(7.2.1) 

o' 

II 

1 

Vj 

^i,j  “b  ttj  ^  0, 

Sj+rj  =  1, 

Vj 

Vi.j  >  0, 

Vi,j 

Sj  >  0, 

Vj 

d' 

IV 

p 

Vj 

IV 

p 

V> 

The  solution  set  P  represents  all  the  solutions  to  the  relaxed  linear  programming  problem,  unlike  using  a 

single  dual  variable  to  generate  a  reduced  solution  set.  One  may  consider  using  P  generated  by  (7.2.1)  and 

the  algorithm  to  solve  for  an  integral  solution.  This  avoids  the  iterations  needed  to  find  other  dual  solutions, 
since  the  entire  solution  set  of  the  relaxed  problem  is  characterized  by  (7.2.1). 
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8  Examples  of  Algorithm 

For  the  example  problem,  the  strategy  as  suggested  in  7.2  is  used  to  characterize  the  full  set  of  solutions 
of  the  dual  problem  and  search  within  this  set  for  a  {0, 1}  solution.  Recall  from  section  2,  the  full  set  of 
solutions  of  the  example  problem  satisfy:  yi  j  =  0,  except  as  follows: 

,  3 

yi.l  =  Wl  =  1,  J/2,3  =  J/4,3  =  tC3  =  1.  y3.3  =  TT, 

44 

and  icj, 2/3,2,  and  2/3,1  satisfy: 


11)2  1,  2/3  1  >  — , 

^  ’  -  40 

41  4 

2/3,2  ~  ^  ~  2^3,1.  0  <  2/3,2  <^2<  22/3.2- 

One  can  simplify,  eliminating  the  variable  2/3,1  which  is  fully  determined,  and  expr^-'s  as  a  problem  in  two 
variables,  w  and  y. 

-y<0, 

43  ^ 

y - <0, 

.55  -  ’ 

y  —  w  <0,  (81) 

4 

w - y  <  0, 

T  ~ 

w-l  <0. 

Expressing  /(x)  =  J{w,y): 

fix)  =  -  w-’)^  +  i-y)l  +  iy- 

+  iy-  »j)l  +  (w-  ^y)l  -f  (u;  - 

See  the  contour  plot  of  f(x)  shown  in  figure  3  below,  where  one  observes  there  are  local  minina  stationary 
points  hi  w  -  0,  y  =  0,  and  w  =  F  2/  S  [f,  Ifl 


^2 

55^^ 


1)1). 


(8.2) 
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For  this  problem. 


{  ""  (l\ 

0  1  i 

,4=  -11,  6=  0 

1  0 
V  1  0  /  V  1.  / 

An  example  using  the  Gauss-Newton  like  algorithm  to  solve  (7.3.2)  is  illustrated,  using  initial  starting  point 
iQ  =  (w,y)'^  =  (1,1)'^. 

Initial  Iteration:  /(xq)  =  /°(xo)  =  {3.5},  /“(xq)  =  {1,4},  and  I^{xo)  =  {2}. 


V/(xo)=(^pj,  J,(xo) 


0  0 


Mxo) 


0  0 

0  1 

-Q  a 

0  0 

^3  0 


where  a  6  [0, 1]  and  /3  €  [0, 1|.  Selecting  the  generalized  Jacobian  d2(xo)  €  d(Axo  -  b)  +  ,  where  q  =  J  =  0, 
one  calculates: 

J^(xo)J(xo)  =  jf(xo)di(xo)  +  Jj {xo)J2{xo)  -  (o  ~  -W(^o)  =  • 


Iteration  1:  Choosing  stepsize  Aq  =  I,  then 


■(«)' 


V/(x,)  = 


so  xi  solves  (7.3.2),  with  /(xi)  =  0.  Therefore,  xi  also  solves  the  assignment  problem. 

If  a  different  generalized  Jacobian,  J^ixo)  €  d(Axo  —  b)^,  is  chosen  where  a  =  ,3  =  1  above,  then 


J^(Xo)d(Xo)  = 


3  -I 
-1  2 


and  do  =  -( J’^(xo) J(xo))  ’V/(xo)  == 


-.0436 

-.1309 


Iteration  la:  Choosing  stepsize  Aq  =  I  gives 

/  0.9564  \  ,  /" -0.0381  N 

-  V  0.8691  j  ’  0.0873  )  ' 

/°(xi)  -  0,  /~(xi)  =  {1,3, 4,5},  and  /'‘‘(xi)  =  {2}.  Notice  that  for  x.  =  (l-lf)^,  that  Xi  is  in  an 
identification  neighborhood  of  x, ,  where  /°(x»)  =  {2,5},  /~(x,)  =  {1,3,4},  and  I'^(x,)  =  0.  Since  I'^{xi)  = 


-0.0381 

0.0873 


0,  then  J^ixx)  €  d[Axi  -  b)+  is  uniquely  determined: 


tT,  ^  ("0.8331  0\  .  .  /  0.0^ 

J^(xi)J(xi)  =  (  Q  and  =  -  (  q  (• 

1  gives  Iteration  2a: 

/  1.0021  \  ,  /"  0.0042  \ 

=  (o,7818  j  ■  0  ) 


Choosing  stepsize  Aj  =  1  gives  Iteration  2a: 


/°(x2)  =  {2},  I~(x,)  =  {1,3,4},  and  /^(x.)  =  {5},  Choosing  J2{x2)  G  (Ax2  -6)  +  ,  then 

/o  o\ 

J2(X2)==  0  0  gives  J^(X2)J(X2)  =  f  and  d2  =  (  q® 

0  0  ^  ^ 


Choosing  stepsize  A2  =  1  so  that 
,  /  -0.0021  ^ 
<'>  =  0 


gives  X3 


f  1  \ 

~  \  42  )  ’ 
\  55  / 


/(X3)=0,  V/(X3)  = 
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9  Conclusions 


We  have  developed  methods  based  on  continuous  optimization  to  solve  a  special  type  of  assignment  problem, 
having  piece-wise  linear  additive  separable  cost  functions.  This  problem  is  not  easily  solved  using  combina¬ 
toric  methods,  due  to  the  nonlinear  cost  function.  Further,  common  continuous  optimization  methods  do 
not  handle  the  {0, 1}  contraints.  The  strong  relaxation  of  the  {0, 1}  constraints  yields  a  linear  program  or, 
equivalently,  a  system  of  inequalities  and  equalities.  Adding  nonlinear  equations  to  this  system  to  force  a 
{0. 1 }  solution  yields  a  problem  which  we  showed  can  be  solved  using  least  square  methods.  We  developed 
algorithms  extending  methods  such  as  Gauss-Newton,  Levenberg-Marquardt,  Newton’s,  and  quasi- Newton, 
for  deriving  search  directions,  and  global  convergence  algorithms  based  on  the  model  trust  region  method 
and  inexact  line  search.  We  have  developed  the  details  of  these  new  algorithms  and  an  overall  theory  for 
local  and  global  convergence  for  both  linear  inequalities  as  well  as  systems  of  inequalities  of  convex  CP- 
functions.  This  work  in  non-differentiable  optimization  is  based  on  on  fundamental  properties  of  generalized 
differentiability.  We  showed  that  these  properties  are  useful  in  developing  theoretical  convergence  results  of 
these  new  algorithms  to  solve  a  special  type  of  problem,  least  square  solution  of  finite  systems  of  inequal¬ 
ities.  The  local  convergence  theory  uses  the  special  approximating  property  that  the  Taylor  series  based 
on  the  generalized  differential  constructs  possesses  in  an  identification  neighborhood  of  a  stationary  point. 
We  developed  and  verified  new  approximating  results  for  both  the  case  of  linear  inequalities  and  systems  of 
inequalities  of  convex  functions,  as  well  as  the  details  of  selecting  generalized  differential  constructs.  Prom 
this,  we  developed  algorithms  based  on  Gauss-Newton,  Levenberg-Marquardt  and  Newton’s  methods,  using 
generalized  differential  constructs  and  showed  that  local  convergence  properties  are  similar  to  the  smooth 
case.  We  developed  a  comparable  method  based  on  quasi-Newton  updates,  defined  a  bounded  deterioration 
property  for  generalized  Hessians,  and,  assuming  this  property,  verified  local  q-linear  convergence. 

The  global  convergence  of  the  Armijo  line  search,  which  we  showed  can  be  used  directly  with  no  mod¬ 
ification,  depends  on  the  Lipschitz  property  of  the  gradient  of  the  convex  least  square  function  on  lower 
level  sets.  We  extended  the  global  trust  region  method  to  handle  systems  of  inequalities  by  using  a  model 
function  based  on  generalized  Hessians  and  verified  convergence. 

We  developed  an  algorithm  to  solve  the  original  assignmer*^  problem  using  the  algorithm  for  solving 
systems  of  nonlinear  equations  and  linear  inequalities.  This  overall  algorithm  either  finds  a  {0, 1 }  solution 
within  the  relaxed  problem  solution  set,  or  it  determines  that  no  such  solution  exists,  thus  requiring  cutting 
planes  be  added  to  the  relaxed  problem  formulation.  We  developed  and  verified  a  property  of  local  minimum 
stationary  points  of  the  least  square  formulation  which,  along  with  proper  choice  of  starting  points  at  each 
iteration,  assures  that  the  assignment  problem  algorithm  converges,  that  is,  either  a  solution  is  found,  or 
it  is  confirmed  that  none  exists.  We  showed  how  to  formulate  the  relaxed  problem  as  a  system  of  linear 
inequalities  and  equalities  which  can  be  solved  using  Han's  algorithm. 

This  work  demonstrates  strong  solvability  properties  by  using  the  Euclidean  norm  for  regularizing  a 
non-differentiable  function  into  a  differentiable  function,  as  well  as  the  increased  solvability  of  transforming 
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a  discontinuous  problem  into  a  continuous  problem.  As  shown,  the  least  square  formulation  is  numerically 
stable  to  small  perturbations  of  computational  inaccuracies.  The  least  square  formulation  for  solving  the 
system  of  inequalities  and  equalities  yields  a  solution  even  if  the  system  is  infeasible.  The  least  square 
formulation,  although  not  equivalent  to  the  original  assignment  problem,  yields  important  information  about 
local  minimum  stationary  points  of  the  least  square  objective  function  for  the  assignment  problem.  Further, 
the  least  square  formulation  for  solving  systems  of  inequalities  provides  a  useful  tool  for  other  applications, 
such  as  solving  linear  programs  and  linear  and  non-linear  complementarity  problems. 

Further  work  would  address  more  general  functions,  especially  non-convex  functions.  Also,  further  work 
should  treat  the  invertibility  assumptions  made  for  local  convergence  results,  deriving  the  specific  conditions, 
such  as  the  MFCQ,  which  assure  invertibility,  similar  to  Burke[1983j,  Marker  and  Pang[1990),  and  Marker 
and  Xiao[1990).  In  addition,  the  specific  conditions  which  assure  the  bounded  deterioration  properly  for  the 
secant  update  in  the  quasi-Newton  algorithm  need  to  be  developed. 

Lastly,  the  theoretical  algorithms  developed  herein  should  be  implemented  and  tested  on  large  scale 
problems,  giving  a  basis  for  practical  comparison  with  other  algorithms  and  comparison  with  smooth  prob¬ 
lems. 
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